Distributing Cached Content in a Network

ABSTRACT

The invention relates to a computer-implemented method, a corresponding a computer program product and a corresponding apparatus for distributing cached content in a network, the computer-implemented method comprising: collecting statistics regarding requests made and paths taken by the requests from source nodes to server nodes via intermediate nodes, the source nodes, intermediate nodes, and server nodes interconnected by edges having queues with respective queue sizes associated therewith, the requests including indications of content items to be retrieved; storing the content items at the server nodes; caching, by the intermediate nodes, the content items up to a caching capacity; and performing caching decisions that determine which of the content items are to be cached at which of the intermediate nodes, based upon costs that are monotonic, non-decreasing functions of the sizes of the queues.

RELATED APPLICATION

This application claims the benefit of U.S. Provisional Application No.62/668,617, filed on May 8, 2018. The entire teachings of the aboveapplication are incorporated herein by reference.

GOVERNMENT SUPPORT

This invention was made with government support under Grant No.NeTS-1718355 from National Science Foundation. The government hascertain rights in the invention.

BACKGROUND

Modern computer networks take advantage of routing, caching, andforwarding decisions in order to improve efficiency and packetthroughput and latency. Improvements in these areas are needed.

SUMMARY

Embodiments of the present disclosure are directed to networks(including but not limited to content delivery networks), network nodes,computer methods, systems, and computer program products that operate innetworks that route requests in a network and replicate and storecontent. According to some embodiments, caching and storage decisions,i.e., determining where content should be stored, are determined in amanner that considers link congestion between neighboring nodes andother parameters, such as delay, in the network. Through this method,embodiments of the present disclosure cache and store content in a moreefficient manner than existing methods that do not consider cachingparameters.

In some embodiments, a network manager is configured to collectstatistics regarding requests made and paths taken by the requests fromsource nodes to server nodes via intermediate nodes. The source nodes,intermediate nodes, and server nodes are interconnected by edges havingqueues with respective queue sizes associated therewith. The requestsmay include indications of content items to be retrieved. The contentitems may be stored at the server nodes. The intermediate nodes may beconfigurable to cache the content items up to a caching capacity. Thenetwork manager may be configured to perform caching decisions thatdetermine which of the content items are to be cached at which of theintermediate nodes based upon costs that are monotonic, non-decreasingfunctions of the sizes of the queues.

In some embodiments, the network manager may be configured to performthe caching decisions further based on a greedy computer-implementedmethod. In some embodiments, the costs may be associated with one ormore of the edges.

In some embodiments, the costs may be determined based upon a Taylorapproximation. In some embodiments, the costs may be expected costs.

In some embodiments, the network manager may be configured to performthe caching decisions further based on marginal gain. In someembodiments, the marginal gain may be associated with the costs withrespect to the determination of which of the content items are to becached.

Certain example embodiments described herein are in reference to anetwork manager embodiment, but pertain similarly to the computermethods, network nodes, networks, systems, and computer programproducts.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing is apparent from the following more particular descriptionof example embodiments, as illustrated in the accompanying drawings inwhich like reference characters refer to the same parts throughout thedifferent views. The drawings are not necessarily to scale, emphasisinstead being placed upon illustrating embodiments.

FIG. 1A is a high-level block diagram of forwarding at nodes of anetwork, according to some embodiments.

FIG. 1B is a high-level block diagram of forwarding and caching at nodesof a network, according to some embodiments.

FIG. 2 is a path graph of caching at nodes of a network, according tosome embodiments.

FIG. 3 illustrates a network topology in which some embodiments may beconfigured to operate.

FIG. 4A illustrates caching gain with power-law demand for variousnetwork topologies in which some embodiments may be configured tooperate.

FIG. 4B illustrates caching gain with uniform demand for various networktopologies in which some embodiments may be configured to operate.

FIG. 5 illustrates running time for various network topologies in whichsome embodiments may be configured to operate.

FIGS. 6A-B illustrate caching gain for varying service rates for variousnetwork topologies in which some embodiments may be configured tooperate.

FIGS. 7A-B illustrate caching gain for varying arrival rates for variousnetwork topologies in which some embodiments may be configured tooperate.

FIGS. 8A-B illustrate caching gain for varying cache capacities forvarious network topologies in which some embodiments may be configuredto operate.

FIG. 9 illustrates a further network topology in which some embodimentsmay be configured to operate.

FIG. 10 is a flow diagram illustrating an example embodiment of a methodof the present disclosure.

FIG. 11 is a network diagram that illustrates a computer network orsimilar digital processing environment in which embodiments of thepresent disclosure may be implemented.

FIG. 12 is a block diagram of an example internal structure of acomputer (e.g., client processor/device or server computers) in thecomputer system or apparatus of FIG. 11, according to some embodiments.

DETAILED DESCRIPTION

A description of example embodiments follows.

The teachings of all patents, published applications and referencescited herein are incorporated by reference in their entirety.

Existing approaches to forwarding requests for content in a computernetwork have multiple problems. For example, existing approaches are notcapable of handling caching of content in an efficient and effectivemanner. In existing approaches, the problem of caching is anon-deterministic polynomial time (NP)-hard problem, and thereforechallenging to implement with efficient computational complexity.Existing approaches of caching are very costly in terms of delay, queuesize, queuing probability, and other queue size dependent costs.

Embodiments of the present disclosure solve the problems associated withexisting approaches. Embodiments provide computer methods, systems,networks, network nodes, and computer program products for handlingcaching to deliver content. In stark contrast to existing approaches,embodiments reduce computational complexity and overcome the problem ofNP-hard optimization, by applying greedy methods. In addition,embodiments of the present disclosure replicate and store contents in anarbitrary network topology to reduce costs that depend on trafficcongestion is provided herein. Such costs include, but are not limitedto, delay, queue size, queuing probability, and other queue sizedependent costs.

Embodiments of the present disclosure have multiple advantages comparedwith existing approaches. As such, according to some embodiments, thecomputer methods, systems, networks, network nodes, and computer programproducts make caching decisions. An embodiment of a method makes cachingdecisions for arbitrary network topologies and storage constraints. Anembodiment of the method determines caching decisions taking intoaccount congestion via arbitrary convex cost functions, in contrast toexisting approaches art that may operates under linear costs.

According to some embodiments, additional advantages include but are notlimited to the following. An embodiment of the method makes cachingdecisions to reduce traffic dependent costs at network queues. Incontrast to existing approaches, an embodiment of the method hasprovable optimality guarantees, attaining a cost reduction within afactor of approximately 0.67 from the optimal cost reduction attained byexisting methods. An embodiment of the method avoids randomization andsampling, attaining provable guarantees at a reduced computationalcomplexity. An embodiment of the method approximates arbitrary convexcost functions via power and Taylor series approximations, leading to afast and efficient implementation avoiding randomization and sampling.An embodiment of the method significantly outperforms existingapproaches in both caching decisions and computational complexity.

It should be understood that descriptions with respect to oneembodiment, e.g., a method, may equally apply to alternativeembodiments, e.g., a network, network node, computer program product, orsystem.

Embodiments may be implemented in a system where content is to be placed(i.e., content distribution) in a network with varying demand. Some suchnetworks may include by are not limited to content delivery networks,information centric networks, peer-to-peer networks, or cloud computing.

FIG. 1A is a high-level block diagram of forwarding at nodes of anetwork 100, according to some embodiments. As illustrated in FIG. 1A,one or more requests (or packets, or packet requests) 156 may arrive 160in the network and pass through a series of nodes 102 along a path 108through the network 100 to a point of packet departure 162. The requestmay pass through one or more queues 136 along each edge 120 of the path108.

FIG. 1B is a high-level block diagram of forwarding and caching at nodesof a network, according to some embodiments. As illustrated in FIG. 1B,one or more requests 106 may arrive 150 in the network (or networkmanager, or system) 100 and content items 112 may be depart 158 thenetwork 100 are being retrieved by a cache hit 152.

As further illustrated in FIG. 1B, the network (or network manager, orsystem) 100 may be configured to collect statistics regarding requests106 made and paths 108 taken by the requests 106 from source nodes 102 ato server nodes 102 c via intermediate nodes 102 b. The server nodes 102may be associated with one or more servers 150. The source nodes 102 a,intermediate nodes 102 b, and server nodes 102 c may be interconnectedby edges 120 having queues 136 with respective queue sizes associatedtherewith. The requests 106 may include indications 110 of content items112 to be retrieved. The content items 112 may be stored at the servernodes. The intermediate nodes 102 b may be configurable to cache thecontent items 112 up to a caching capacity. The network manager may beconfigured to perform caching decisions that determine which of thecontent items 112 are to be cached at which of the intermediate nodes120 b, based upon costs that are monotonic, non-decreasing functions ofthe sizes of the queues 136.

Some embodiments include queues, including but not limited to M/M/1queues or any other queues known to one skilled in the art, in whichnodes act as caches that store objects. As known to one skilled in theart, an M/M/k queue represents queue length in a system having a numberof servers k (one server for an M/M/1 queue), where arrivals may bedetermined by Poisson process and service times may have an exponentialdistribution.

Exogenous requests for objects may be routed towards nodes that storethem; as a result, object traffic in the network is determined may notonly by demand but, also, by where objects are cached. Embodimentsdetermine how to place objects in caches to attain a certain designobjective, such as, e.g., minimizing network congestion or retrievaldelays. Embodiments show that for a broad class of objectives, includingminimizing both the expected network delay and the sum of network queuelengths, this optimization problem can be cast as an NP-hard submodularmaximization problem. Embodiments show that a continuous greedy method[1] may attain a ratio close to 1−1/e≈0.63 using a deterministicestimation via a power series, which may drastically reduce executiontime compared with existing approaches that rely on sampling.Embodiments generalize, beyond M/M/1 queues, to networks of M/M/k andsymmetric M/D/1 queues.

1 INTRODUCTION

According to some embodiments, Kelly networks [2] include multi-classnetworks of queues capturing a broad array of queue service disciplines,including FIFO, LIFO, and processor sharing. Both Kelly networks andtheir generalizations (including networks of quasi-reversible andsymmetric queues) are well studied and classic topics [2], [3], [4],[5]. One of their most appealing properties is that their steady-statedistributions have a product-form: as a result, steady state propertiessuch as expected queue sizes, packet delays, and server occupancy rateshave closed-form formulas as functions of, e.g., routing and schedulingpolicies.

Some embodiments include Kelly networks, in which nodes are equippedwith caches, i.e., storage devices of finite capacity, which can be usedto store objects. In some embodiments, Exogenous requests for objectsmay be routed towards nodes that store them; upon reaching a node thatstores the requested object, a response packet containing the object isrouted towards the request source. As a result, embodiments determineobject traffic in the network not only by the demand but, also, by whereobjects are cached. Some embodiments may include one or more networkingapplications involving the placement and transmission of content. Assuch, some embodiments include information centric networks [6], [7],[8], content delivery networks [9], [10], web-caches [11], [12], [13],wireless/femtocell networks [14], [15], [16], and peer-to-peer networks[17], [18].

According to some embodiments, determining the object placement, i.e.,how to place objects in network caches, is a decision that can be madeby the network designer in response to object popularity and demand. Tothat end, embodiments may determine how to place objects in caches sothat traffic attains a design objective such as minimizing delay.

Embodiments provide various contributions.

First, embodiments solve the problem of optimizing the placement ofobjects in caches in Kelly cache networks of M/M/1 queues, with theobjective of minimizing a cost function of the system state. Embodimentsillustrate that, for a broad class of cost functions, including packetdelay, system size, and server occupancy rate, this optimization mayamount to a submodular maximization problem with matroid constraints.This result applies to general Kelly networks with fixed service rates;in particular, it holds for FIFO, LIFO, and processor sharingdisciplines at each queue.

According to some embodiments, the continuous greedy method [1] mayattain a 1−1/e approximation for this NP-hard problem. However, it doesso by computing an expectation over a random variable with exponentialsupport via randomized sampling. The number of samples required toattain the 1−1/e approximation guarantee can be prohibitively large inrealistic settings.

Second, according to some embodiments, for Kelly networks of M/M/1queues, this randomization can be entirely avoided: a closed-formsolution may be computed using the Taylor expansion of the problemobjective herein. Unlike some embodiments, existing approaches fail toidentify a submodular maximization problem that exhibits this structure,and to exploit it to eschew sampling.

In addition, some embodiments extend results to networks of M/M/k andsymmetric M/D/1 queues, prove a negative result: submodularity does notarise in networks of M/M/1/k queues. Embodiments include method that areextensively evaluated over several synthetic and real-life topologies.

Embodiments include and are described herein with respect to relatedwork (Section 2 herein), a mathematical model of a Kelly cache network(Section 3 herein), and results on submodularity and thecontinuous-greedy method in networks of M/M/1 queues (in Sections 4 and5 herein, respectively). Embodiments include extensions that aredescribed herein (Section 6 herein), numerical evaluation (Section 7herein).

2 RELATED WORK

Our approach is closest to, and inspired by, recent work by Shanmugam etal. [19] and Ioannidis and Yeh [8]. Ioannidis and Yeh consider a settingvery similar to ours but without queuing: edges are assigned a fixedweight, and the objective is a linear function of incoming trafficscaled by these weights. According to some embodiments, one where edgecosts are linear (see also Section 3.2). Shanmugam et al. [19] study asimilar optimization problem, restricted to the context of femtocaching.The authors show that this is an NP-hard, submodular maximizationproblem with matroid constraints. They provide a 1−1/e approximationmethod based on a technique by Ageev and Sviridenko [20]: this involvesmaximizing a concave relaxation of the original objective, and roundingvia pipage-rounding [20]. Ioannidis and Yeh show that the sameapproximation technique applies to more general cache networks withlinear edge costs. They also provide a distributed, adaptive method thatattains an 1−1/e approximation. The same authors extend this frameworkto jointly optimize both caching and routing decisions [21].

Some embodiments can be seen as an extension of [8], [19], in that theyincorporate queuing in the cache network. In contrast to both [8] and[19] however, costs like delay or queue sizes are highly non-linear inthe presence of queuing. From a technical standpoint, this departurefrom linearity enables embodiments herein to employ significantlydifferent optimization methods than the ones in [8], [19]. Someembodiments do not admit a concave relaxation and, consequently, thetechnique by Ageev and Sviridenko [20] used in [8], [19] does not apply.Instead, embodiments solve a non-convex optimization problem directly(c.f. Eq. (13)) using the so-called continuous-greedy method.

Existing approaches solve cache optimization problems under restrictedtopologies [9], [22], [23], [24], [25]. These works model the network asa bipartite graph: nodes generating requests connect directly to cachesin a single hop. Unlike embodiments, the existing approaches do notreadily generalize to arbitrary topologies. In general, theapproximation technique of Ageev and Sviridenko [20] applies to thisbipartite setting, and additional approximation methods are devised forseveral variants [9], [22], [23], [24]. Embodiments improve by (a)considering a multi-hop setting, and (b) introducing queuing, which noneof the above works considers.

Submodular function maximization subject to matroid constraints appearsin many important problems in combinatorial optimization; for a briefreview of the topic and applications, see [26] and [27], respectively.Nemhauser et al. [28] show that the greedy method produces a solutionwithin ½ of the optimal. Vondrák [29] and Calinescu et al. [1] show thatthe continuous-greedy method produces a solution within (1−1/e) of theoptimal in polynomial time, which cannot be further improved [30]. Inthe general case, the continuous-greedy method requires sampling toestimate the gradient of the so-called multilinear relaxation of theobjective (see Section 5). According to some embodiments, MAXCG, theoptimization problem studied herein, exhibits additional structure:Embodiments use this to construct a sampling-free estimator of thegradient via a power-series or Taylor expansion. Some embodiments usesuch an expansion to eschew sampling; this technique may apply tosubmodular maximization problems beyond MAXCG.

3 MODEL

Motivated by applications such as ICNs [6], CDNs [9], [10], andpeer-to-peer networks [17], some embodiments include Kelly cachenetworks. In contrast to classic Kelly networks, each node is associatedwith a cache of finite storage capacity. Exogenous traffic consisting ofrequests is routed towards nodes that store objects; upon reaching anode that stores the requested object, a response packet containing theobject is routed towards the node that generated the request. As aresult, content traffic in the network is determined not only by demandbut, crucially, by how contents are cached. Appendix A.1 includesclassic Kelly networks. An illustration highlighting the differencesbetween Kelly cache networks, introduced below, and classic Kellynetworks, can be found in FIG. 1A.

Some embodiments describe Kelly cache networks in terms of FIFO M/M/1queues, the product form distribution (c.f. (4)) arises for manydifferent service principles beyond FIFO (c.f. Section 3.1 of [2])including Last-In First-Out (LIFO) and processor sharing. All resultsincluded herein extend to these service disciplines; more extensions areincluded in Section 6.

3.1 Kelly Cache Networks

Graphs and Paths. We use the notation G (V, E) for a directed graph Gwith nodes V and edges E⊆V×V. A directed graph is called symmetric orbidirectional if (u, v)∈E if and only if (v, u)∈E. A path p is asequence of adjacent nodes, i.e., p=p₁, p₂, . . . , p_(K) where (p_(k),p_(k+1))∈E, for all 1≤i<K≡|p|. A path is simple if it contains no loops(i.e., each node appears once). We use the notation v∈p, where v∈V, toindicate that node v appears in the path, and e∈p, where e=(u, v)∈E, toindicate that nodes u,v are two consecutive (and, therefore, adjacent)nodes in p. For v∈p, where p is simple, embodiments denote byk_(p)(v)∈{1, . . . , |p|} the position of node v∈V in p, i.e.,k_(p)(v)=k if p_(k)=V.

Network Definition. Some embodiments include a Kelly network of M/M/1FIFO queues, represented by a symmetric directed graph G (V, E). As inclassic Kelly networks, each edge e∈E is associated with an M/M/1 queuewith service rate μ_(e). We associate queues with edges forconcreteness. Alternatively, queues can be associated with nodes, orboth nodes and edges; all such representations lead to product formdistributions (4), and results herein extend to these cases. Inaddition, each node has a cache that stores objects of equal size from aset

, the object catalog. Each node v∈V may store at most c_(v)∈

objects from

in its cache. Hence, if x_(vi)∈{0,1} is a binary variable indicatingwhether node v∈V is storing object i∈

, then Σ_(i∈)

x_(vi)≤c_(v), for all v∈V. Some embodiments include x=[x_(vi)]_(v∈V,i∈)

∈{0,1}^(|V∥)

^(|) as the global placement or, simply, placement vector. We denote by

={x∈{0,1}^(|V∥)

^(|):Σ_(i∈)

x _(vi) <c _(v) ,∀v∈V},  (1)

the set of feasible placements that satisfy the storage capacityconstraints. Some embodiments assume that for every object i∈

, there exists a set of nodes

_(i)⊆V that permanently store i. Some embodiments include nodes in

_(i) as designated servers for i∈

. Some embodiments assume that designated servers store i in permanentstorage outside their cache. Put differently, the aggregate storagecapacity of a node is c′_(v)=c_(v)+|{i:v∈

_(i)}|, but only the non-designated slots c_(v) are part of the system'sdesign.

Object Requests and Responses. Traffic in the cache network consists oftwo types of packets: requests and responses, as shown in FIG. 1B.Requests for an object are always routed towards one of its designatedservers, ensuring that every request is satisfied. However, requests mayterminate early: upon reaching any node that caches the requestedobject, the latter generates a response carrying the object. This isforwarded towards the request's source, following the same path as therequest, in reverse. Consistent with prior literature [8], [21],embodiments treat request traffic as negligible when compared toresponse traffic, which carries objects, and henceforth focus only onqueues bearing response traffic.

Formally, a request and its corresponding response are fullycharacterized by (a) the object being requested, and (b) the path thatthe request follows. That is, for the set of requests

, a request r∈

is determined by a pair (i^(r), p^(r)), where i^(r)∈

is the object being requested and p^(r) is the path the request follows.Each request r is associated with a corresponding Poisson arrivalprocess with rate λ^(r)≥0, independent of other arrivals and servicetimes. We denote the vector of arrival rates by λ=

∈

. For all r∈

, Some embodiments assume that the path p^(r) is well-routed [8], thatis: (a) path p^(r) is simple, (b) the terminal node of the path is adesignated server, i.e., a node in

_(i), and (c) no other intermediate node in p^(r) is a designatedserver. As a result, requests are always served, and response packets(carrying objects) always follow a sub-path of p^(r) in reverse towardsthe request source (namely, p₁ ^(r)).

Steady State Distribution. Given an object placement x∈

, the resulting system is a multi-class Kelly network, with packetclasses determined by the request set

. This is a Markov process over the state space determined by queuecontents. In particular, let n_(e) ^(r) be the number of packets ofclass r∈

in queue e∈E, and n_(e)=

n_(e) ^(r) be the total queue size. The state of a queue n_(e)∈

^(n) ^(e) , e∈E, is the vector of length n_(e) representing the class ofeach packet in each position of the queue. The system state is thengiven by n=[n_(e)]_(e∈E); embodiments denote by Ω the state space ofthis Markov process.

In contrast to classic Kelly networks, network traffic and, inparticular, the load on each queue, depend on placement x. Indeed, if(v, u)∈p^(r) for r∈

, the arrival rate of responses of class r∈

in queue (u, v)∈E is:

$\begin{matrix}{{\lambda_{({u,v})}^{r}\left( {x,\lambda} \right)} = {\lambda^{r}{\prod\limits_{k^{\prime} = 1}^{k_{p}r\text{(v)}}\; \left( {{{\left. {1 - x_{\; {p_{\; k^{\prime}}^{\; r}\; i^{\; r}}}} \right),\mspace{14mu} {{for}\mspace{14mu} \left( {v,u} \right)}} \in p^{r}},} \right.}}} & (2)\end{matrix}$

i.e., responses to requests of class r pass through edge (u, v)∈E if andonly if no node preceding u in the path p^(r) stores object i^(r)—seealso FIG. 1B. As μ_((u,v)) is the service rate of the queue in (u, v)∈E,the load on edge (u, v)∈E is:

$\begin{matrix}{{\rho_{({u,v})}\left( {x,\lambda} \right)} = {\frac{1}{\mu_{({u,v})}}\Sigma_{r \in {:{{({v,u})} \in p^{r}}}}{{\lambda_{({u,v})}^{r}\left( {x,\lambda} \right)}.}}} & (3)\end{matrix}$

The Markov process {n(t); t≥0}_(t≥0) is positive recurrent whenρ_((u,v))(x,λ)<1, for all (u, v)∈E [2], [31]. Then, the steady-statedistribution has a product form, i.e.:

$\begin{matrix}{{{(n)} = {\prod_{\epsilon \in E}{_{e}\left( n_{e} \right)}}},\; {n \in \Omega},{{{where}\mspace{14mu} {\pi_{e}\left( n_{e} \right)}} = {\left( {1 - {\rho_{e}\left( {x,\lambda} \right)}} \right){\prod_{r \in {:{e \in p^{r}}}}\left( \frac{\lambda_{e}^{r}\left( {x,\lambda} \right)}{\mu_{e}} \right)^{n_{e}^{r}}}}},\mspace{14mu} {{and}\mspace{14mu} {\lambda_{e}^{r}\left( {x,\lambda} \right)}},} & (4)\end{matrix}$

π_(e)(x, λ) are given by (2), (3), respectively.

Stability Region. Given a placement x∈

, a vector of arrival rates λ=

yields a stable (i.e., positive recurrent) system if and only ifλ∈λ_(x), where

∇_(x):={λ:λ≥0,ρ_(e)(x,λ)<1,∀e∈E}⊂

,  (5)

where loads ρ_(e), e∈E, are given by (3). Conversely, given a vector λ∈

,

_(λ) ={x∈

:ρ _(e)(x,λ)<1,∀e∈E}⊆

  (6)

is the set of feasible placements under which the system is stable. Itis easy to confirm that, by the monotonicity of ρ_(e) w.r.t. x, if x∈

_(λ) and x′≥X, then x′∈

_(λ), where the vector inequality x′≥x is component-wise. In particular,if 0∈D_(λ) (i.e., the system is stable without caching), then

_(λ)=

.

3.2 Cache Optimization

Given a Kelly cache network represented by graph G(V,E), service ratesμ_(e), e∈E, storage capacities c_(v), v∈V, a set of requests

, and arrival rates λ_(r), for r∈

, embodiments determine placements x∈

that optimize a certain design objective. Some embodiments determineplacements that are solutions to optimization problems of the followingform:

$\begin{matrix}{{{Minimize}:\; {\overset{MINCOST}{{C(x)} = \Sigma_{\epsilon \in E}}{C_{e}\left( {\rho_{e}\left( {x,\lambda} \right)} \right)}}},} & \left( {7a} \right) \\{{{subj}.\mspace{14mu} {{to}:\; {x \in D_{\lambda}}}},} & \left( {7b} \right)\end{matrix}$

where C_(e): [0,1)→

₊, e∈E, are positive cost functions, ρ_(e):

×

→

₊ is the load on edge e, given by (3), and

_(λ) is the set of feasible placements that ensure stability, given by(6). We make the following standing assumption on the cost functionsappearing in MINCOST:

Assumption 1. For all e∈E, functions C_(e): [0,1)→

₊ are convex and non-decreasing on [0,1).

Assumption 1 is natural; it holds for many cost functions that oftenarise in practice. Embodiments include several examples:

Example 1. Queue Size: Under steady-state distribution (4), the expectednumber of packets in queue e∈E is given by

${{\left\lbrack n_{e} \right\rbrack} = {{C_{e}\left( \rho_{e} \right)} = \frac{\rho_{e}}{1 - \rho_{e}}}},$

which is convex and non-decreasing for ρ_(e)∈[0,1). Hence, the expectedtotal number of packets in the system in steady state can be written asthe sum of such functions.

Example 2. Delay: From Little's Theorem [31], the expected delayexperienced by a packet in the system is

${{\lbrack T\rbrack} = {\frac{1}{{\lambda }_{1}}{\sum_{e \in E}{\left\lbrack n_{e} \right\rbrack}}}},$

where ∥λ∥₁=

λ^(r) is the total arrival rate, and

[n_(e)] is the expected size of each queue. Thus, the expected delay canalso be written as the sum of functions that satisfy Assumption 1.According to some embodiments, the same is true for the sum of theexpected delays per queue e∈E, as the latter are given by

${{\left\lbrack T_{e} \right\rbrack} = {{\frac{1}{\lambda_{e}}{\left\lbrack n_{e} \right\rbrack}} = \frac{1}{\mu_{e}\left( {1 - \rho_{e}} \right)}}},$

which are also convex and non-decreasing in ρ_(e).

Example 3. Queuing Probability/Load per Edge: In a FIFO queue, thequeuing probability is the probability of arriving in a system where theserver is busy; this is given by C_(e)(ρ_(e))=ρ_(e)=λ_(e)/μ_(e), whichis again non-decreasing and convex. This is also, of course, the loadper edge. By treating 1/μ_(e) as the weight of edge e∈E, this settingrecovers the objective of [8] as a special case of some embodiments.

Example 4. Monotone Separable Costs: More generally, Assumption 1 holdsfor arbitrary monotone separable costs, i.e., functions that are (1)separable across queues, (2) depend only on queue sizes n_(e), and (3)are non-decreasing.

Formally:

Lemma 1. Consider a state-dependent cost function c: Ω→

₊ such that:

${{c(n)} = {\sum\limits_{e \in E}^{\;}\; {c_{e}\left( n_{e} \right)}}},$

where c_(e):

→

₊, e∈E, are non-decreasing functions of the queue sizes n_(e), e∈E.Then, the steady state cost under distribution (4) has precisely form(7a) with convex costs, i.e.,

${\left\lbrack {c(n)} \right\rbrack} = {\sum\limits_{e \in E}^{\;}\; {C_{e}\left( \rho_{e} \right)}}$

where C_(e):[0,1)→

₊ satisfy Assumption 1.

Proof. As the cost at state n∈Ω can be written asc(n)=Σ_(e∈E)c_(e)(n_(e)), embodiments may include that

[c(n)]=Σ_(e∈E)

[c_(e)(n_(e))]. On the other hand, as c_(e)(n_(e))≥0,

$\begin{matrix}\begin{matrix}{{\left\lbrack {c_{e}\left( n_{e} \right)} \right\rbrack} = {\sum\limits_{n = 0}^{\infty}\; {{c_{e}(n)}{P\left( {n_{e} = n} \right)}}}} \\{\mspace{95mu} {= {{c_{e}(0)} + {\sum\limits_{n = 0}^{\infty}\; {\left( {{c_{e}\left( {n + 1} \right)} - {c_{e}(n)}} \right){P\left( {n_{e} > n} \right)}}}}}} \\{ {\overset{\text{(23)}}{=}{{c_{e}(0)} + {\sum\limits_{n = 0}^{\infty}\; {\left( {{c_{e}\left( {n + 1} \right)} - {c_{e}(n)}} \right)\rho_{e}^{n}}}}}}\end{matrix} & (8)\end{matrix}$

As c_(e) is non-decreasing, c_(e)(n+1)−c_(e)(n)≥0 for all n∈

. On the other hand, for all n∈

, ρ^(n) is a convex non-decreasing function of ρ in [0,1), so

[c_(e)(n_(e))] is a convex function of ρ as a positively weighted sum ofconvex non-decreasing functions.

In summary, MINCOST captures many natural cost objectives, whileAssumption 1 holds for any monotonically increasing cost function thatdepends only on queue sizes.

TABLE 1 Notation Summary Kelly Cache Networks G(V, E) Network graph,with nodes V and edges E k_(p)(v) position of node v in path p μ(u,v)Service rate of edge (u,v) ϵ E

Set of classes/types of requests λ^(r) Arrival rate of class r ϵ  

p^(r) Path followed by class r ϵ  

i^(r) Object requested by class r ϵ  

Item catalog

_(t) Set of designated servers of i ϵ  

c_(v) Cache capacity at node v ϵ V x_(vt) Variable indicating whether vϵ V stores i ϵ  

x Placement vector of x_(vt)s, in {0, 1} ^(|V||C|) λ Vector of arrivalrates λ^(r), r ϵ  

λ_(e) ^(r) Arrival rate of class r responses over edge e ϵ E ρ_(e) Loadon edge e ϵ E Ω State space n Global state vector in Ω π(n) Steady-statedistribution of n ϵ Ω n_(e) State vector of queue at edge e ϵ Eπ_(e)(n_(e)) Marginal of steady-state distribution of queue n_(e) n_(e)Size of queue at edge e ϵ E Cache Optimization

Global Cost function

_(e) Cost function of edge e ϵ E

Set of placements x satisfying capacity constraints x₀ A feasibleplacement in  

F(x) Caching gain of placement x over x₀ y_(vt) Probability that v ϵ Vstores i ϵ  

y Vector of marginal probabilities y_(vtr) in {0, 1} ^(|V||C|) G(y)Multilinear extension under marginals y D_(λ) Set of placements underwhich system is stable under arrivals λ

Convex hull of constraints of MAXCG Conventions supp(.) Support of avector conv(.) Convex hull of a set [x] + 1 Vector equal to x with i-thcoordinate set to 1 [x] − 1 Vector equal to x with i-th coordinate setto 0 0 Vector of zeros

4 SUBMODULARITY AND THE GREEDY METHOD

According to some embodiments, the problem of MINCOST is NP-hard; thisis true even when cost functions c_(e) are linear, and the objective isto minimize the sum of the loads per edge [8], [19]. Embodiments solvethis problem. Embodiments include the objective of MINCOST as asupermodular set function. Embodiments contribute by showing that thisproperty is a direct consequence of Assumption 1.

Cost Supermodularity and Caching Gain. Some embodiments observe that thecost function C in MINCOST can be naturally expressed as a set function.According to some embodiments, for S⊂V×

, let x_(S)∈{0,1

be the binary vector whose support is S (i.e., its non-zero elements areindexed by S). As there is a 1-1 correspondence between a binary vectorx and its support supp(x), embodiments interpret C: {0,1

→

₊ as set function C: V×

:→

₊ via C(S)

C(x_(S)). Then, the following theorem holds:

Theorem 1. Under Assumption 1, C(S)

C(x_(S)) is non-increasing and supermodular over {supp(x): x∈

_(λ)}.

Proof. Some embodiments prove the following auxiliary lemma:

Lemma 2. Let f:

→

be a convex and non-decreasing function. Also, let g: χ→

be a non-increasing supermodular set function. Then h(x)

f(g(x)) is also supermodular.

Proof. Since g is non-increasing, for any x, x′⊆χ embodiments mayinclude

g(x∩x′)≥g(x)≥g(x∪x′),

g(x∩x′)≥g(x′)≥g(x∪x′),

Due to supermodularity of g, embodiments can find α, α′∈[0,1], α+α′≤1such that

g(x)=(1−α)g(x∩x′)+αg(x∪x′),

g(x′)=(1−α′)g(x∩x′)+α′g(x∪x′).

Then, some embodiments have

$\begin{matrix}{{f\left( {g(x)} \right)} + {f\left( {g\left( x^{\prime} \right)} \right)}} \\{{\leq {{\left( {1 - \alpha} \right){f\left( {g\left( {x\bigcap x^{\prime}} \right)} \right)}} + {{\alpha f}\left( {g\left( {x\bigcup x^{\prime}} \right)} \right)}}}} \\{{{{+ \left( {1 - \alpha^{\prime}} \right)}{f\left( {g\left( {x\bigcap x^{\prime}} \right)} \right)}} + {\alpha^{\prime}{f\left( {g\left( {x\bigcup x^{\prime}} \right)} \right)}}}} \\{{= {{f\left( {g\left( {x\bigcap x^{\prime}} \right)} \right)} + {f\left( {g\left( {x\bigcup x^{\prime}} \right)} \right)}}}} \\{{{+ \left( {1 - \alpha - \alpha^{\prime}} \right)}\left( {{f\left( {g\left( {x\bigcap x^{\prime}} \right)} \right)} - {f\left( {g\left( {x\bigcup x^{\prime}} \right)} \right)}} \right)}} \\{{{\leq {{f\left( {g\left( {x\bigcap x^{\prime}} \right)} \right)} + {f\left( {g\left( {x\bigcup x^{\prime}} \right)} \right)}}},}}\end{matrix}$

where the first inequality is due to convexity of f, and the second oneis because α+α′≤1 and f(g(⋅)) is non-increasing. This proves h(x)

f(g(x)) is supermodular.

To conclude the proof of Thm. 1, observe that it is easy to verify thatρ_(e), ∀e∈E, is supermodular and non-increasing in S (see also [8]).Since, by Assumption 1, C_(e) is a non-decreasing function, then,C_(e)(S)

C_(e)(ρ_(u,v)(S)) is non-increasing. By Lemma 2, C_(S)(S) is alsosupermodular. Hence, the cost function is non-increasing andsupermodular as the sum of non-increasing and supermodular functions.

In light of the observations in Section 3.2 regarding Assumption 1, Thm.1 implies that supermodularity arises for a broad array of natural costobjectives, including expected delay and system size; it also appliesunder the full generality of Kelly networks, including FIFO, LIFO, andround robin service disciplines. Armed with this theorem, someembodiments convert MINCOST to a submodular maximization problem. Indoing so, some embodiments solve the problem that domain

_(λ), determined not only by storage capacity constraints, but also bystability, may be difficult to characterize. Nevertheless, embodimentsillustrate that a problem that is amenable to approximation can beconstructed, provided that a placement x₀∈

_(λ) is known.

In particular, suppose that embodiments include access to a single x₀∈

_(λ). We define the caching gain F:

_(λ)→

₊ as F(x)=C(x₀)−C(x). Note that, for x≥x₀, F(x) is the relative decreasein the cost compared to the cost under x₀. We consider the followingoptimization problem:

$\begin{matrix}{{{Maximize}\text{:}\mspace{14mu} \overset{{MAX}\; {CG}}{{F(x)} = C}\left( x_{0} \right)} - {C(x)}} & \left( {9a} \right) \\{{{{{subj}.\mspace{11mu} {to}}:\mspace{14mu} x} \in },{x \geq x_{0}}} & \left( {9b} \right)\end{matrix}$

Some embodiments observe that, if 0∈

_(λ), then

_(λ)=

; in this case, taking x₀=0 ensures that problems MINCOST and MAXCG areequivalent. If x₀≠0, the above formulation attempts to maximize the gainrestricted to placements x∈

that dominate x₀: such placements necessarily satisfy x∈

_(λ). Thm. 1 has the following immediate implication:

Corollary 1. The caching gain F(S)

F(x_(S)) is non-decreasing and submodular over {supp(x): x∈

_(λ)}.

Greedy Method. Constraints (9b) define a (partition) matroid [1], [19].This, along with the submodularity and monotonicity of F illustrate thatembodiments may produce a solution within ½-approximation from theoptimal via the greedy method [32]. The method, summarized in Method 1,iteratively allocates items to caches that yield the largest marginalgain. The solution produced by Method 1 is guaranteed to be within a½-approximation ratio of the optimal solution of MAXCG [28]. Theapproximation guarantee of ½ is tight:

Lemma 3. For any ε>0, there exists a cache network the greedy methodsolution is within ½+ε from the optimal, when the objective is the sumof expected delays per edge.

Proof. Consider the path topology illustrated in FIG. 2. Assume thatrequests for files 1 and 2 are generated at node u with rates λ₁=λ₂=δ,for some δ∈(0,1). Files 1 and 2 are stored permanently at v and z,respectively. Caches exist only on u and w, and have capacityc_(u)=c_(w)=1. Edges (u, v), (w, z) have bandwidthμ_((u,v))=μ_((w,z))=1, while edge (u, w) is a high bandwidth link,having capacity M>>1. Let x₀=0. The greedy method starts from emptycaches and adds item 2 at cache u. This is because the caching gain fromthis placement is

${{c_{({u,w})} + c_{({w,z})}} = {\frac{1}{M - \delta} + \frac{1}{1 - \delta}}},$

while the caching gain of all other decisions is at most

$\frac{1}{1 - \delta}.$

Any subsequent caching decisions do not change the caching gain. Theoptimal solution is to cache item 1 at u and item 2 at w, yielding acaching gain of 2/(1−δ). Hence, the greedy solution attains anapproximation ratio

$0.5 \cdot {\left( {1 + \frac{1 - \delta}{M - \delta}} \right).}$

By appropriately choosing M and δ, this can be made arbitrarily close to0.5.

Algorithm 1 Greedy Input F :  

  →  

 ₊,x₀  1: x ← x₀  2: while A(x) := {(v,i) ϵ V ×  

  : x + e_(vt) ϵ  

 } is not empty do  3:  (v*, i*) ← arg max_((v,t)ϵA(x)) (F(x + e_(et)) −F(x))  4:  x ← x + e_(v)*_(t)*  5: end while  6: return x

As illustrated in Section 7 herein, the greedy method performs well inpractice for some topologies; however, Lemma 3 may include alternativemethods, that attain improved approximation guarantees. According tosome embodiments, it is easy to extend Lemma 3 to other objectives,including, e.g., expected delay, queue size, etc. According to someembodiments, tight instances can be constructed using caches withcapacities larger than 1 (see, e.g., FIG. 3).

5 CONTINUOUS-GREEDY METHOD

According to some embodiments, the continuous-greedy method by Calinescuet al. [1] attains a tighter guarantee than the greedy method, raisingthe approximation ratio from 0.5 to 1−1/e≈0.63. The method maximizes theso-called multilinear extension of objective F, thereby obtaining afractional solution Y in the convex hull of the constraint space. Theresulting solution is then rounded to produce an integral solution. Themethod requires estimating the gradient of the multilinear extension viasampling; interestingly, embodiments prove that MaxCG exhibitsadditional structure, which can be used to construct a polynomial-timeestimator of this gradient that eschews sampling altogether, by using aTaylor expansion.

Algorithm 2 Continuous-Greedy Input G :  

  →  

 ₊, x₀, stepsize 0 < γ ≤ 1  1: t ← 0, k ← 0 y₀ ← x₀  2: while t < 1 do 3:  m_(k) ← arg  

  (m, ∇G(y_(k)))  4:  γ_(k) ← min {γ, 1 − t}  5:  γ_(k+1) ← γ_(k) +γ_(k) m_(k),t ← t + γ_(k), k ← k + 1  6: end while  7: return y_(k)

5.1 Method Overview

Formally, the multilinear extension of the caching gain F is defined asfollows. Define the convex hull of the set defined by the constraints(9b) in MAXCG as:

=conv({x:x∈

,x≥x ₀})⊆[0,1]

  (10)

Intuitively, y∈

is a fractional vector in

satisfying the capacity constraints, and the bound y≥x₀.

Given a y∈

, consider a random vector x in {0,1

generated as follows: for all v∈V and i∈

, the coordinates x_(vi)∈{0,1} are independent Bernoulli variables suchthat P(x_(vi)=1)=y_(vi). The multilinear extension G:

→

₊ of F:

_(λ)→

₊ is defined via following expectation G(y)=

_(y)[F(x)], parameterized by y∈

, i.e.,

$\begin{matrix}{{{G(y)} = {\sum\limits_{x \in {\{{0,1}\}}^{{v}{c}}}{{F(x)} \times {\prod\limits_{{({v,i})} \in {V \times C}}{y_{vi}^{x_{vl}}\left( {1 - y_{vi}} \right)}^{1 - x_{vi}}}}}},} & (11)\end{matrix}$

The continuous-greedy method, summarized in Method 2, proceeds by firstproducing a fractional vector y∈

. Starting from y₀=x₀, the method iterates over:

$\begin{matrix}{{m_{k} \in {\arg \; {\max_{m \in \overset{\sim}{}}{\langle{m,{\nabla{G\left( y_{k} \right)}}}\rangle}}}},} & \left( {12a} \right) \\{y_{k + 1} = {y_{k} + {\gamma_{k}m_{k}}}} & \left( {12b} \right)\end{matrix}$

for an appropriately selected step size γ_(k)∈[0,1]. Intuitively, thisyields an approximate solution to the non-convex problem:

Maximize: G(y)  (13a)

subj. tαy∈

.  (13b)

Even though (13) may not be convex, the output of Method 2 is within a1−1/e factor from the optimal solution y*∈

of (13). This fractional solution can be rounded to produce a solutionto MaxCG with the same approximation guarantee using either the pipagerounding [20] or the swap rounding [1], [33] schemes: for completeness,both are referred to herein in Appendix B.

Note that the maximization in (12a) is a Linear Program (LP): itinvolves maximizing a linear objective subject to a set of linearconstraints, and can thus be computed in polynomial time. However, thispresumes access to the gradient ∇G. On the other hand, the expectationG(y)=

_(y)[F(x)] alone, given by (11), involves a summation over

terms, and it may not be easily computed in polynomial time. To addressthis, in some embodiments, an approach is to produce random samples xand use these to produce an unbiased estimate of the gradient (see,e.g., [1]); this estimate can be used in Method 2 instead of thegradient. Before presenting the estimator tailored to MaxGC herein, thissampling-based estimator is described herein.

A Sampling-Based Estimator. Function G is linear when restricted to eachcoordinate y_(vi), for some v∈V, i∈

(i.e., when all inputs except y_(vi) are fixed). As a result, thepartial derivative of G w.r.t. y_(vi) can be written as:

$\begin{matrix}{\frac{\partial{G(y)}}{\partial y_{vi}} = {_{y}\left\lbrack {{{{{F(x)}\left. {x_{vi} = 1} \right\rbrack} - {_{y}\left\lbrack {{{F(x)}x_{vi}} = 0} \right\rbrack}} \geq 0},} \right.}} & (14)\end{matrix}$

where the last inequality is due to monotonicity of F. One can thusestimate the gradient by (a) producing T random samples

,

=1, . . . , T of the random vector x, consisting of independentBernoulli coordinates, and (b) computing, for each pair (v, i)∈V×

, the average

$\begin{matrix}{{\underset{\partial y_{ui}}{\overset{\bigwedge}{\partial{G(y)}}} = {\frac{1}{T}{\sum\limits_{ = 1}^{T}\left( {{F\left( {\left\lbrack x^{} \right\rbrack + \left( {v,i} \right)} \right)} - {F\left( \left\lbrack x^{} \right\rbrack_{- {({v,i})}} \right)}} \right)}}},} & (15)\end{matrix}$

where [x]_(+(v,i)),[x]_(−(v,i)) are equal to vector x with the (v, i)thcoordinate set to 1 and 0, respectively. Using this estimate, Method 2attains an approximation ratio arbitrarily close to 1−1/e forappropriately chosen T. In particular, the following theorem holds:

Theorem 2. [Calinescu et al. [1]] Consider Method 2, with ∇G(y_(k))replaced by the sampling-based estimate ∇

), given by (15). Set

${T = {\frac{10}{\delta^{2}}\left( {1 + {\ln \left( {{}{V}} \right)}} \right)}},$

and γ=δ, where

$\delta = {\frac{1}{40{}{{V} \cdot \left( {\sum\limits_{v \in V}c_{v}} \right)^{2}}}.}$

Then, the method terminates after K=1/γ=1/δ steps and, with highprobability,

G(y ^(K))≥(1−(1−δ)^(1/δ))G(y*)≥(1−1/e)G(y*),

where y* is an optimal solution to (13).

The proof of the theorem can be found in Appendix A of Calinescu et al.[1] for general submodular functions over arbitrary matroid constraints;embodiments state Theorem 2 here with constants T and γ set specificallyfor their objective G and the set of constraints

.

Complexity. Under this parametrization of T and γ, Method 2 runs inpolynomial time. More specifically, note that 1/δ=O(|

∥V|·(Σ_(v∈V)c_(v))²) is polynomial in the input size. Moreover, themethod runs for K=1/δ iterations in total. Each iteration requires

$T = {0\left( {\frac{1}{\delta^{2}}\left( {1 + {\ln \left( {{}{V}} \right)}} \right.} \right.}$

samples, each involving a polynomial computation (as F can be evaluatedin polynomial time). LP (12a) can be solved in polynomial time in thenumber of constraints and variables, which are O(|V∥

|). Finally, the rounding schemes presented in Appendix B are alsopoly-time, both requiring at most O(|V∥

|) steps.

5.2 A Novel Estimator Via Taylor Expansion

According to some embodiments, an approach to estimate the gradient viasampling has certain drawbacks. The number of samples T required toattain the 1−1/e ratio is quadratic in |V∥

|. In practice, even for networks and catalogs of moderate size (say,|V|=|

|=100), the number of samples becomes prohibitive (of the order of 10⁸).Producing an estimate for ∇G via a closed form computation that eschewssampling thus has significant computational advantages. In this section,embodiments illustrate that the multilinear relaxation of the cachinggain F admits such a closed-form characterization.

According to some embodiments, a polynomial f:

^(d)→

is in Weighted Disjunctive Normal Form (W-DNF) if it may be written as

f(x)=

·

_((s))(1−x _(j)),  (16)

for some index set

, positive coefficients β_(s)>0, and index sets I(s)⊆{1, . . . , d}.Intuitively, treating binary variables x_(j)∈{0,1} as boolean values,each W-DNF polynomial can be seen as a weighted sum (disjunction) amongproducts (conjunctions) of negative literals. These polynomials arisenaturally in the context of the problem; in particular:

Lemma 4. For all k≥1, x∈

, and e∈E, ρ_(e) ^(k)(x, λ) is a W-DNF polynomial whose coefficientsdepend on λ.

Proof (Sketch). The lemma holds for k=1 by (2) and (3). The lemmafollows by induction, as W-DNF polynomials over binary x∈

are closed under multiplication; this is because (1−

=(1−x) for all

≥1 when x∈{0,1}. A detailed proof be found in Appendix C.1.

Hence, all load powers are W-DNF polynomials. Expectations of W-DNFpolynomials have a remarkable property:

Lemma 5. Let f:

_(λ)→

be a W-DNF polynomial, and let x∈

be a random vector of independent Bernoulli coordinates parameterized byy∈

. Then

_(y)[f(x)]=f((y), where f(y) is the evaluation of the W-DNF polynomialrepresenting f over the real vector y.

Proof. As f is W-DNF, it may be written as

${f(x)} = {\sum\limits_{s \in }{\beta_{s}{\prod\limits_{t \in \underset{\_}{}}\left( {1 - x_{t}} \right)}}}$

for appropriate

, and appropriate β_(s),

(s), where s∈

.

Hence,

$\begin{matrix}{{_{y}\left\lbrack {f(x)} \right\rbrack} = {\sum\limits_{s \in }{\beta_{s}{_{y}\left\lbrack {\left( {1 - x_{t}} \right)} \right\rbrack}}}} \\{{{\sum\limits_{s \in }{\beta_{s}\left( {1 - {_{y}\left\lbrack x_{t} \right\rbrack}} \right)}},{{by}\mspace{14mu} {independence}}}} \\{= {\sum\limits_{s \in }{\beta_{s}{\left( {1 - y_{t}} \right).}}}}\end{matrix}$

Lemma 5 states that, to compute the expectation of a W-DNF polynomial fover i.i.d. Bernoulli variables with expectations y, it suffices toevaluate f over input y. Expectations computed this way therefore do notrequire sampling.

Embodiments leverage this property to approximate ∇G(y) by taking theTaylor expansion of the cost functions C_(e) at each edge e∈E.Embodiments enable C_(e) as a power series w.r.t. ρ_(e) ^(k), k≥1; fromLemmas 4 and 5, embodiments can compute the expectation of this seriesin a closed form. In particular, by expanding the series and rearrangingterms it is easy to show the following lemma:

Lemma 6. Consider a cost function C_(e): [0,1)→

₊ which satisfies Assumption 1 and for which the Taylor expansion existsat some ρ*∈[0,1). Then, for x∈

a random Bernoulli vector parameterized by y∈

,

$\begin{matrix}{\frac{\partial{G(y)}}{\partial y_{vi}} \approx {\sum\limits_{e \in E}{\sum\limits_{k = 1}^{L}{\alpha_{e}^{(k)}\left\lbrack {{\rho_{e}^{k}\left( {\lbrack y\rbrack_{- {({v,i})}},\lambda} \right)} - {\rho_{e}^{k}\left( {\lbrack y\rbrack_{+ {({v,i})}},\lambda} \right)}} \right\rbrack}}}} & (17)\end{matrix}$

where,

$\alpha_{e}^{(k)} = {\sum\limits_{i = k}^{L}{\frac{\left( {- 1} \right)^{i - k}\begin{pmatrix}i \\k\end{pmatrix}}{i!}{C_{e}^{(i)}\left( \rho^{*} \right)}\left( \rho^{*} \right)^{i - k}}}$

for k=0, 1, . . . , L, and the error of the approximation is:

$\frac{1}{\left( {L + 1} \right)!}\Sigma_{e \in E}{{{C_{e}^{({L + 1})}\left( \rho^{\prime} \right)}\left\lbrack {{_{{\lbrack y\rbrack}_{{- v},i}}\left\lbrack \left( {{\rho_{e}\left( {x,\lambda} \right)} - \rho^{*}} \right)^{L + 1} \right\rbrack} - {_{{\lbrack y\rbrack}_{{+ v},i}}\left\lbrack \left( {{\rho_{e}\left( {x,\lambda} \right)} - \rho^{*}} \right)^{L + 1} \right\rbrack}} \right\rbrack}.}$

Proof. The Taylor expansion of C_(e) at ρ* is given by:

${{C_{e}(\rho)} = {{C_{e}\left( \rho^{*} \right)} + {\sum\limits_{k = 1}^{L}\; {\frac{1}{k!}{C_{e}^{(k)}\left( \rho^{*} \right)}{\left( {\rho - \rho^{*}} \right)^{k}++}\frac{1}{\left( {L + 1} \right)!}{C_{e}^{({L + 1})}\left( \rho^{\prime} \right)}\left( {\rho - \rho^{*}} \right)^{L + 1}}}}},$

where ρ′∈[ρ*, ρ] and C_(e) ^((k)) is the k-th order derivative of C_(e).By expanding this polynomial and reorganizing the terms, someembodiments get

${{C_{e}(\rho)} = {{\sum\limits_{k = 0}^{L}\; {\alpha_{e}^{(k)}\rho^{(k)}}} + {\frac{1}{\left( {L + 1} \right)!}{C_{e}^{({L + 1})}\left( \rho^{\prime} \right)}\left( {\rho - \rho^{*}} \right)^{L + 1}}}},{where}$${\alpha_{e}^{(k)} = {\sum\limits_{i = k}^{L}\; {\frac{\left( {- 1} \right)^{i - k}\begin{pmatrix}i \\k\end{pmatrix}}{i!}{C_{e}^{(i)}\left( \rho^{*} \right)}\left( \rho^{*} \right)^{i - k}}}},$

for k=0, 1, . . . , L. Consider now the L-th order Taylor approximationof C_(e), given by

${{\hat{C}}_{e}(\rho)} = {\sum\limits_{k = 0}^{L}\; {\alpha_{e}^{(k)}{\rho^{k}.}}}$

Clearly, this is an estimator of C_(e), with an error of the order|C_(e)(ρ)−Ĉ_(e)(ρ)|=o((ρ−ρ_(*))^(L)). Thus, for x∈

a random Bernoulli vector parameterized by y∈

,

$\begin{matrix}{{{_{y}\left\lbrack {C_{e}\left( {\rho_{e}\left( {x,\lambda} \right)} \right)} \right\rbrack} \approx {_{y}\left\lbrack {{\hat{C}}_{e}\left( {\rho_{e}\left( {x,\lambda} \right)} \right)} \right\rbrack}} = {\sum\limits_{k = 0}^{L}\; {\alpha_{e}^{(k)}{_{y}\left\lbrack {\rho_{e}^{k}\left( {x,\lambda} \right)} \right\rbrack}}}} & (18)\end{matrix}$

On the other hand, for all v∈V and i∈

:

$\begin{matrix}\begin{matrix}{\frac{\partial{G(y)}}{\partial y_{vi}}\overset{(14)}{=}} & {{{_{y}\left\lbrack {{{F(x)}x_{vi}} = 1} \right\rbrack} - {_{y}\left\lbrack {{{F(x)}x_{vi}} = 0} \right\rbrack}}} \\{\overset{({9a})}{=}} & {{{_{y}\left\lbrack {{{C(x)}x_{vi}} = 0} \right\rbrack} - {_{y}\left\lbrack {{{C(x)}x_{vi}} = 1} \right\rbrack}}} \\{\overset{{({7a})},{(18)}}{\approx}} & {{\sum\limits_{e \in E}{\sum\limits_{k = 1}^{L}\; {\alpha_{e}^{k}\left( {{_{y}\left\lbrack {{{\rho_{e}^{k}\left( {x,\lambda} \right)}x_{vi}} = 0} \right\rbrack} -} \right.}}}} \\ & {\left. {_{y}\left\lbrack {{{\rho_{e}^{k}\left( {x,\lambda} \right)}x_{vi}} = 1} \right\rbrack} \right),}\end{matrix} & (19)\end{matrix}$

where the error of the approximation is given by

$\frac{1}{\left( {L + 1} \right)!}{\sum\limits_{e \in E}{{C_{e}^{({L + 1})}\left( \rho^{\prime} \right)}{\quad\left\lbrack {{_{y}\left\lbrack {{\left( {{\rho_{e}\left( {x,\lambda} \right)} - \rho^{*}} \right)^{L + 1}x_{vi}} = 0} \right\rbrack} - {_{y}\left\lbrack {{\left( {{\rho_{e}\left( {x,\lambda} \right)} - \rho^{*}} \right)^{L + 1}x_{vi}} = 1} \right\rbrack}} \right\rbrack}}}$

The lemma thus follows from Lemmas 4 and 5.

Estimator (17) is deterministic: no random sampling is required.Moreover, Taylor's theorem enables characterizing the error (i.e., thebias) of this estimate. We use this to characterize the final fractionalsolution y produced by Method 2:

Theorem 3. Assume that all C_(e), e∈E, satisfy Assumption 1, areL+1-differentiable, and that all their L+1 derivatives are bounded byW≥0. Then, consider Method 2, in which ∇G(y_(k)) is estimated via theTaylor estimator (17), where each edge cost function is approximated atρ_(e)*=

_(y) _(k) [ρ_(e)(x,λ)]=ρ_(e)(γ_(k),λ). Then,

$\begin{matrix}{{{G\left( y_{K} \right)} \geq {{\left( {1 - \frac{1}{e}} \right){G\left( y^{*} \right)}} - {2{DB}} - \frac{P}{2K}}},} & (20)\end{matrix}$

where

$K = \frac{1}{\gamma}$

is the number of iterations, y* is an optimal solution to (13),

${D = {{{y}_{2}} \leq {{V} \cdot {\max\limits_{v \in V}c_{v}}}}},$

is the diameter of

,

$B \leq \frac{W{E}}{\left( {L + 1} \right)!}$

is the bias of estimator (17), and P=2C(x₀), is a Lipschitz constant of∇G.

The proof can be found in Appendix C.2. The theorem immediately impliesthat some embodiments can replace (17) as an estimator in Method 2, andattain an approximation arbitrarily close to 1−1/e.

Estimation via Power Series. For arbitrary L+1-differentiable costfunctions C_(e), the estimator (17) can be leveraged by replacing C_(e)with its Taylor expansion. In the case of queue-dependent costfunctions, as described in Example 4 of Section 3.2, the power-series(8) can be used instead. For example, the expected queue size (Example1, Section 3.2), is given by

${C_{e}\left( \rho_{e} \right)} = {\frac{\rho_{e}}{1 - \rho_{e}} = {\sum\limits_{k = 1}^{\infty}\; {\rho_{e}^{k}.}}}$

In contrast to the Taylor expansion, this power series does not dependon a point ρ_(e)* around which the function C_(e) is approximated.

6 BEYOND M/M/1 QUEUES

As discussed in Section 3 herein, the classes of M/M/1 queues for whichthe supermodularity of the cost functions arises is quite broad, andincludes FIFO, LIFO, and processor sharing queues. According to someembodiments, Section 6 herein illustrates how results extend to evenbroader families of queuing networks. Chapter 3 of Kelly [2] provides ageneral framework for a set of queues for which service times areexponentially distributed; also summarized in Appendix A.2 herein. Alarge class of networks can be modeled by this framework, includingnetworks of M/M/k queues; all such networks maintain the property thatsteady-state distributions have a product form. This enables someembodiments to extend results to M/M/k queues for two cost functionsC_(e):

Lemma 7. For a network of M/M/k queues, both the queuing probability andthe expected queue size are non-increasing and supermodular over sets{supp(x): x∈

_(λ)}. This is given by the so-called Erlang C formula [31].

According to some embodiments, as an immediate consequence of Lemma 7and Little's theorem, both the sum of the expected delays per queue, butalso the expected delay of an arriving packet, are also supermodular andnon-decreasing.

Product-form steady-state distributions arise also in settings whereservice times are not exponentially distributed. A large class ofquasi-reversible queues, named symmetric queues exhibit this property(c.f. Section 3.3 of [2] and Chapter 10 of [4]). Symmetric queues arepresented in Appendix A.3 herein. In the following lemma embodimentsleverage the product form of symmetric queues to extend results to M/D/1symmetric queues [31].

Lemma 8. For a network of M/D/1 symmetric queues, the expected queuesize is non-increasing and supermodular over sets {supp(x): x∈

_(λ)}.

Again, Lemma 8 and Little's theorem imply that this property alsoextends to network delays. It is worth noting that conclusions similarto these in Lemmas 7 and 8 are not possible for all general queues withproduct form distributions. In particular, also embodiments prove thefollowing negative result:

Lemma 9. There exists a network of M/M/1/k queues, containing a queue e,for which no strictly monotone function C_(e) of the load ρ_(e) at aqueue e is non-increasing and supermodular over sets {supp(x): x∈

_(λ)}. In particular, the expected size of queue e is neither monotonenor supermodular.

The proofs for all the Lemmas of this section, i.e., Lemmas 7 to 9 canbe found in Section D.

7 NUMERICAL EVALUATION

TABLE 1 Graph Topologies and Experiment Parameters Graph |V| |E| | 

 | | 

 | |Q| c_(v) F_(PL)(x_(RND)) F_(UNI)(x_(RND)) ER 100 1042 300 1K 4 32.75 2.98 ER-20Q 100 1042 300 1K 20 3 3.1 2.88 HC 128 896 300 1K 4 32.25 5.23 HC-20Q 128 896 300 1K 20 3 2.52 5.99 star 100 198 300 1K 4 36.08 8.3 path 4 3 2  2 1 1 1.2 1.2 dtelekom 68 546 300 1K 4 3 2.57 3.66abilene 11 28 4  2 2 1/2 4.39 4.39 geant 22 66 10 100 4 2 19.68 17.22

Networks. Embodiments employ Methods 1 and 2 over 9 network topologies,summarized in Table 2. Graphs ER and ER-20Q are the same 100-nodeErdős-Rényi graph with parameter p=0.1. Graphs HC and HC-20Q are thesame hypercube graph with 128 nodes, and graph star is a star graph with100 nodes. The graph path is the topology shown in FIG. 2. The last 3topologies, namely, dtelekom, geant, and abilene represent the DeutscheTelekom, GEANT, and Abilene backbone networks, respectively. The latteris also shown in FIG. 3.

Experimental Setup. For path and abilene, embodiments set demands,storage capacities, and service rates as illustrated in FIGS. 2 and 3,respectively. Both of these settings induce an approximation ratio closeto ½ for greedy. For all remaining topologies, embodiments consider acatalog of size |

| objects; for each object, embodiments select 1 node uniformly atrandom (u.a.r.) from V to serve as the designated server for thisobject. To induce traffic overlaps, embodiments also select |Q| nodesu.a.r. that serve as sources for requests; all requests originate fromthese sources. All caches are set to the same storage capacity, i.e.,c_(v)=c for all v∈V. Embodiments generate a set of |

| possible types of requests. For each request type r∈

, λ^(r)=1 request per second, and path p^(r) is generated by selecting asource among the |Q| sources u.a.r., and routing towards the designatedserver of object i^(r) using a shortest path method. Embodimentsconsider two ways of selecting objects i^(r)∈

: in the uniform regime, i^(r) is selected u.a.r. from the catalog

; in the power-law regime, i^(r) is selected from the catalog

via a power law distribution with exponent 1.2. All the parametervalues, e.g., catalog size |

|, number of requests |

|, number of query sources |Q|, and caching capacities c_(v) arepresented in Table 2.

Embodiments construct heterogeneous service rates as follows. Everyqueue service rate is either set to a low value μ_(e)=μ_(low) or a highvalue μ_(e)=μ_(high), for all e∈E. Embodiments select μ_(low) andμ_(high) as follows. Given the demands r∈

and the corresponding arrival rates λ^(r), some embodiments compute thehighest load under no caching (x=0), i.e., some embodiments findλ_(max)=max_(e∈E)Σ_(r:e∈p) _(r) λ^(r). We then set μ_(low)=λ_(max)×1.05and μ_(high)=λ_(max)×200. We set the service rate to μ_(low) for allcongested edges, i.e., edges e s.t. λ_(e)=λ_(max). Embodiments set theservice rate for each remaining edge e∈E to μ_(low) independently withprobability 0.7, and to μ_(high) otherwise. Note that, as a result 0∈

_(λ)=

, i.e., the system is stable even in the absence of caching and, onaverage, 30 percent of the edges may have a high service rate.

Placement Methods. Some embodiments include several placement methods:(a) Greedy, i.e., the greedy method (Alg. 1), (b) Continuous-Greedy withRandom Sampling (CG-RS), i.e., Method 2 with a gradient estimator basedon sampling, as described in Section 5.1, (c) Continuous-Greedy withTaylor approximation (CGT), i.e., Method 2 with a gradient estimatorbased on the Taylor expansion, as described in Section 5.2, and (d)Continuous-Greedy with Power Series approximation (CG-PS), i.e., Method2 with a gradient estimator based on the power series expansion,described also in Section 5.2. In the case of CG-RS, some embodimentscollect 500 samples, i.e., T=500. In the case of CG-PS, some embodimentsinclude the first and second order expansions of the power series asCG-PS1 and CG-PS2, respectively. In the case of CGT, some embodimentsinclude the first-order expansion (L=1). In both cases, subsequent tothe execution of Method 2 some embodiments include produce an integralsolution in

by rounding via the swap rounding method [33]. All continuous-greedymethods use γ=0.001. Some embodiments also implement a random selectionmethod (RND), which caches c_(v) items at each node v∈V, selecteduniformly at random. Some embodiments repeat RND 10 times, and reportthe average running time and caching gain.

Caching Gain Across Different Topologies. The caching gain F(x) for xgenerated by different placement methods, is shown for power-law arrivaldistribution and uniform arrival distribution in FIGS. 4a and 4b ,respectively. The values are normalized by the gains obtained by RND,reported in Table 2. Also, the running times of the methods forpower-law arrival distribution are reported in FIG. 5. As is illustratedin FIGS. 4A-B, Greedy is comparable to other methods in most topologies.However, for topologies path and abilene Greedy obtains a sub-optimalsolution, in comparison to the continuous-greedy method. In fact, forpath and abilene Greedy performs even worse than RND. In FIGS. 4A-B,according to some embodiments, the continuous-greedy methods withgradient estimators based on Taylor and Power series expansion, i.e.,CG-PS1, CG-PS2, and CGT outperform CG-RS500 in most topologies. Also,FIG. 5 illustrates that CG-RS500 runs 100 times slower than thecontinuous-greedy methods with first-order gradient estimators, i.e.,CG-PS1 and CGT. Note that 500 samples are significantly below the value,stated in Theorem 2, needed to attain the theoretical guarantees of thecontinuous-greedy method, which is quadratic in |V∥

|.

Varying Service Rates. For topologies path and abilene, theapproximation ratio of Greedy is ≈0.5. This ratio is a function ofservice rate of the high-bandwidth link M. In this experiment, someembodiments explore the effect of varying M on the performance of themethods in more detail. Some embodiments plot the caching gain obtainedby different methods for path and abilene topologies, using differentvalues of M∈{M_(min), 10,20,200}, where M_(min) is the value that putsthe system on the brink of instability, i.e., 1 and 2+ε for path andabilene, respectively. Thus, some embodiments gradually increase thediscrepancy between the service rate of low-bandwidth and high-bandwidthlinks. The corresponding caching gains are plotted in FIGS. 6A-B, as afunction of M. Some embodiments see that as M increases the gainattained by Greedy worsens in both topologies: when M=M_(min) Greedymatches the performance of the continuous-greedy methods, in both cases.However, for higher values of M it is beaten not only by all variationsof the continuous-greedy method, but by RND as well.

Effect of Congestion on Caching Gain. In this experiment, someembodiments study the effect of varying arrival rates λ^(r) on cachinggain F. Some embodiments report results only for the dtelekom and ERtopologies and power-law arrival distribution. Some embodiments obtainthe cache placements x using the parameters presented in Table 2 anddifferent arrival rates: λ^(r)∈{0.65,0.72,0.81,0.9,1.0}, for r∈

. FIGS. 7A-B show the caching gain attained by the placement methods asa function of arrival rates. Some embodiments observe that as arrivalrates increase, the caching gain attained by almost all methods, exceptRND, increases significantly. Moreover, CG-PS1, CG-PS2, CGT, and Greedyhave a similar performance, while CG-RS500 achieves lower caching gains.

Varying Caching Capacity. In this experiment, some embodiments study theeffect of increasing cache capacity c_(v) on the acquired caching gains.Again, some embodiments report the results only for the dtelekom and ERtopologies and power-law arrival distribution. We evaluate the cachinggain obtained by different placement methods using the parameters ofTable 2 and different caching capacities: c_(v)∈{1,3,10,30} for v∈V. Thecaching gain is plotted in FIGS. 8A-B. As some embodiments illustrate,in all cases the obtained gain increases, as caching capacitiesincrease. This is expected: caching more items reduces traffic anddelay, increasing the gain.

8 CONCLUSIONS

Embodiments herein provide feasible object placements targeting manydesign objectives of interest, including system size and delay, can bedetermined using combinatorial techniques. Some embodiments includecharacterization of approximable objectives for certain classes ofqueues, including M/M/1/k queues, open. Some embodiments also solveproblems relating to stability. Some embodiments performcharacterization of the stability region of arrival rates Λ=

Λ(x). Some embodiments determine membership in this set (or,equivalently, given λ, determining whether there exists a x∈

under which the system is stable) is NP-hard or not, and this region canbe approximated by embodiments. Methods presented herein may be offline:identifying how to determine placements in an online, distributedfashion, in a manner that attains a design objective (as in [8], [21]),or even stabilizes the system (as in [7]), remains an important openproblem.

APPENDIX A

Kelly Networks and Networks of Symmetric Queues

A.1 Kelly Networks Under M/M/1 Queues

Kelly networks [2], [3], [4] (i.e., multi-class Jackson networks) arenetworks of queues operating under a fairly general service disciplines(including FIFO, LIFO, and processor sharing, to name a few). Asillustrated in FIG. 1A, a Kelly network can be represented by a directedgraph G (V, E), in which each edge is associated with a queue. In thecase of First-In First-Out (FIFO) queues, each edge/link e∈E isassociated with an M/M/1 queue with service rate μ_(e)≥0. In an opennetwork, packets of exogenous traffic arrive, are routed throughconsecutive queues, and subsequently exit the network; the path followedby a packet is determined by its class.

Formally, let

be the set of packet classes. For each packet class r∈

, embodiments denote by p^(r)⊆V the simple path of adjacent nodesvisited by a packet. Packets of class r∈

arrive according to an exogenous Poisson arrival process with rateλ^(r)>0, independent of other arrival processes and service times. Uponarrival, a packet travels across nodes in p^(r), traversing intermediatequeues, and exits upon reaching the terminal node in p^(r).

A Kelly network forms a Markov process over the state space determinedby queue contents. In particular, let n_(e) ^(r) be the number ofpackets of class r∈

in queue e∈E, and n_(e)=

n_(e) ^(r) be the total queue size. The state of a queue n_(e)∈

^(n) ^(e) , e∈E, is the vector of length n_(e) representing the class ofeach packet in each position of the queue. The system state is thengiven by n=[n_(e)]_(e∈E); embodiments denote by Ω the state space ofthis Markov process.

The aggregate arrival rate λ_(e) at an edge e∈E is given byλ_(e)=Σ_(r:e∈p) _(r) λ^(r), while the load at edge e∈E is given byρ_(e)=λ_(e)/μ_(e). Kelly's extension of Jackson's theorem [2] statesthat, if ρ_(e)<1 for all e∈E, the Markov process {n(t); t≥0}_(t≥0) ispositive recurrent, and its steady-state distribution has the followingproduct form:

$\begin{matrix}{{{\pi (n)} = {\underset{e \in E}{\Pi}{\pi_{e}\left( n_{e} \right)}}},{n \in \Omega},{where}} & (21) \\{{\pi_{e}\left( n_{e} \right)} = {\left( {1 - \rho_{e}} \right){\left( \frac{\lambda^{r}}{\mu_{e}} \right)^{n_{e}^{r}}.}}} & (22)\end{matrix}$

As a consequence, the queue sizes n_(e), e∈E, also have a product formdistribution in steady state, and their marginals are given by:

P[n _(e) =k]=(1−ρ_(e))ρ_(e) ^(k) ,k∈

.  (23)

A.2 General Kelly Networks

The steady-state distribution (21) holds for many different serviceprinciples beyond FIFO (c.f. Section 3.1 of [2]). In short, incomingpackets can be placed in random position within the queue according to agiven distribution, and the (exponentially distributed) service effortcan be split across different positions, possibly unequally; bothplacement and service effort distributions are class-independent. Thiscaptures a broad array of policies including FIFO, Last-In First-Out(LIFO), and processor sharing: in all cases, the steady-statedistribution is given by (21).

In general Kelly networks, queue e∈{1, 2, . . . , |E|}, assuming itcontains n_(e) packets in the queue, operates in the following manner:

-   -   1) Each packet (customer) requires an exponentially distributed        amount of service.    -   2) A total service effort is provided by queue e at the rate        μ_(e)(n_(e)).    -   3) The packet in position l in the queue is provided with a        portion γ_(e)(l, n_(e)) of the total service effort, for l=1, 2,        . . . , n_(e); when this packet completes service and leaves the        queue, packets in positions l+1, l+2, . . . , n_(e) move down to        positions l, l+1, . . . , n_(e)−1, respectively.    -   4) An arriving packet at queue j moves into position l, for        l=1,2, . . . , n_(e), with probability δ_(e)(l, n_(e)+1);        packets that where in positions l, l+1, . . . , n_(e)+1, move up        to positions l+1, l+2, . . . , n_(e)+1, respectively.

Clearly, some embodiments include μ_(e)(n_(e))>0 for n_(e)>0; inaddition,

$\begin{matrix}{{{\sum\limits_{l = 1}^{n_{e}}\; {\gamma_{e}\left( {l,n_{e}} \right)}} = 1},} & (24) \\{{\sum\limits_{l = 1}^{n_{e}}\; {\delta_{e}\left( {l,n_{e}} \right)}} = 1.} & (25)\end{matrix}$

Kelly's theorem [2] states that, if ρ_(e)<1 for all e∈E, the state ofqueue e in equilibrium is independent of the rest of the system, hence,it may have a product form. In addition, the probability that queue econtains n_(e) packets is

$\begin{matrix}{{{\pi_{e}\left( n_{e} \right)} = {b_{e}\frac{\lambda_{e}^{n_{e}}}{\prod\limits_{l = 1}^{n_{e}}\; {\mu_{e}(l)}}}},} & (26)\end{matrix}$

where b_(e) is the normalizing factor. As is illustrated from (30), notethat the steady-state distribution may not be a function of γ_(e)'s, andδ_(e)(l, n_(e)+1)'s, and hence, is independent of the packet placementand service allocation distributions.

According to some embodiments, by allowing μ_(e)(l)=mu_(e), embodimentsobtain the results in (23).

A.3 Networks of Symmetric Queues

Let n_(e) be the number of packets placed in positions 1,2, . . . , n inqueue e∈E. Queue e is defined as symmetric queue if it operates in thefollowing manner

-   -   1) The service requirement of a packet is a random variable        whose distribution may depend upon the class of the customer.    -   2) A total service effort is provided by queue e at the rate        μ_(e)(n_(e)).    -   3) The packet in position l in the queue is provided with a        portion γ_(e)(l, n_(e)) of the total service effort, for l=1, 2,        . . . , n_(e); when this packet completes service and leaves the        queue, packets in positions l+1, l+2, . . . , n_(e) move down to        positions l, l+1, . . . , n_(e)−1, respectively.    -   4) An arriving packet at queue e moves into position l, for l=1,        2, . . . , n_(e), with probability γ_(e)(l, n_(e)+1); packets        that where in positions l, l+1, . . . , n_(e)+1, move up to        positions l+1, l+2, . . . , n_(e)+1, respectively.

Similarly, some embodiments include μ_(e)(n_(e))>0 for n_(e)>0; inaddition,

$\begin{matrix}{{{\sum\limits_{l = 1}^{n_{e}}\; {\gamma_{e}\left( {l,n_{e}} \right)}} = 1},} & (27)\end{matrix}$

As shown in [2], and [4], symmetric queues have product formsteady-state distributions. In particular, it turns out the probabilityof there are n_(e) packets in queue e is similar to that given by (26).

APPENDIX B

Rounding

Several poly-time methods can be used to round the fractional solutionthat is produced by Method 2 to an integral x∈

. Embodiments may include such rounding methods: pipage rounding [20],which is deterministic, and swap-rounding [33], which may be randomized.Embodiments may also include [8], [20] for pipage rounding, and [33] forswap rounding.

According to some embodiments, pipage rounding uses the followingproperty of G: given a fractional solution y∈

, there are at least two fractional variables y_(vi) and y_(v′i′), suchthat transferring mass from one to the other, 1) makes at least one ofthem 0 or 1, 2) the new ŷ remains feasible in

, and 3) G(ŷ)≥G(y(1)), that is, the expected caching gain at ŷ is atleast as good as y. This process is repeated until ŷ does not have anyfractional element, at which point pipage rounding terminates and returnŷ. This procedure has a run-time of O(|V∥

|) [8], and since each rounding step can only increase G, it followsthat the final integral ŷ∈

must satisfy

${{F\left( \hat{y} \right)} = {{G\left( \hat{y} \right)} \geq {\left\lbrack {G(y)} \right\rbrack} \geq {\left( {1 - \frac{1}{e}} \right){G\left( y^{*} \right)}} \geq {\left( {1 - \frac{1}{e}} \right){F\left( x^{*} \right)}}}},$

where x* is an optimal solution to MaxCG. Here, the first equality holdsbecause F and G are equal when their arguments are integral, while thelast inequality holds because (13) is a relaxation of MaxCG, maximizingthe same objective over a larger domain.

According to some embodiments, in swap rounding, given a fractionalsolution y∈

produced by Method 2 observe that it may be written as a convexcombination of integral vectors in

, i.e., y=Σ_(k=1) ^(K)γ_(k)m_(k), where γ_(k)∈[0,1], Σ_(k=1)^(K)γ_(k)=1, and m_(k)∈

. Moreover, by construction, each such vector m_(k) is maximal, in thatall capacity constraints are tight. Swap rounding iteratively mergesthese constituent integral vectors, producing an integral solution. Ateach iteration i, the present integral vector c_(k) is merged withm_(k+1)∈

into a new integral solution c_(k+1)∈

as follows: if the two solutions c_(k), m_(k+1) differ at a cache v∈V,items in this cache are swapped to reduce the set difference: either anitem i in a cache in c_(k) replaces an item j in M_(k+1), or an item jin M_(k+1) replaces an item i in c_(k); the former occurs withprobability proportional to

γ_(l), and the latter with probability proportional to γ_(k+1). Theswapping is repeated until the two integer solutions become identical;this merged solution becomes c_(k+1). This process terminates after K−1steps, after which all the points m_(k) are merged into a singleintegral vector c_(K)∈

. Observe that, in contrast to pipage rounding, swap rounding does notrequire any evaluation of the objective F during rounding. This makesswap rounding significantly faster to implement; this comes at theexpense of the approximation ratio, however, as the resulting guarantee1−1/e is in expectation.

APPENDIX C

Continuous Greedy with Taylor-Expansion Gradient Estimation

C.1 Proof of Lemma 4

Embodiments prove this by induction on k≥1. Observe first that, by (3),the load on each edge e=(u, v)∈E can be written as a polynomial of thefollowing form:

$\begin{matrix}{{{\rho_{e}\left( {x,\lambda} \right)} = {{{\beta_{r}(\lambda)} \cdot}\left( {1 - x_{j}} \right)}},} & (28)\end{matrix}$

for appropriately defined

_(e)=

_((u,v)) ={r∈

:(v,u)∈p ^(r)},

_(e)(r)={(w,i ^(p))∈V×

:w∈p ^(r) ,k _(p) _(r) (w)≤k _(p) _(r) (v)}, and

β_(r)(λ)=λ^(r)/μ_(e),

In other words, ρ_(e):

_(λ)→

is a W-DNF polynomial. For the induction step, observe that W-DNFpolynomials, seen as functions over the integral domain

_(λ), are closed under multiplication. In particular, the followinglemma holds:

Lemma 10. Given two W-DNF polynomials f₁:

_(λ)→

and f₂:

_(λ)→

, given by

f₁(x) = β_(r)(1 − x_(t)), and${{f_{2}(x)} = {\sum\limits_{r \in _{2}}{\beta_{r}{\sum\limits_{i \in {\mathcal{E}_{2}{(r)}}}\left( {1 - x_{t}} \right)}}}},$

their product f₁·f₂ is also a W-DNF polynomial over

_(λ), given by:

( f 1 · f 2 )  ( x ) = r  β r  β r ′   ( 1 - x i )

Proof. To see this, observe that

f₁(x)f₁(x) = (1 − x_(t))²(1 − x_(t))

where Δ is the symmetric set difference. On the other hand, as(1−x_(t))∈{0,1}, embodiments include that (1−x_(t))²=(1−x_(t)), and thelemma follows.

Hence, if ρ_(e) ^(k)(x, λ) is a W-DNF polynomial, by (28) and Lemma 10,so is ρ_(e) ^(k+1)(x, λ).

C.2 Proof of Theorem 3

Some embodiments improve by bounding the bias of estimator (19). Indeed,given a set of continuous functions {C_((u,v)}_((u,v)∈E) where theirfirst L+1 derivatives within their operating regime, [0,1), areupperbounded by a finite constant, W, the bias of estimator z≡[z_(vi)

, where z_(vi) is defined by (21), is given by

$\begin{matrix}\begin{matrix}{B \equiv} & {{{z - {\nabla{G(y)}}}}_{2}} \\{=} & {{{{\sum\limits_{e \in E}{\frac{1}{\left( {L + 1} \right)!}{C_{e}^{({L + 1})}\left( \rho_{e}^{\prime} \right)}\left( {\rho_{e} - \rho_{e}^{*}} \right)^{L + 1}}}}_{2},}}\end{matrix} & (29)\end{matrix}$

where ρ′_(e)∈[ρ_(e)*,ρ_(e)]. To compute the bias, According to someembodiments, ρ_(e), ρ_(e)*∈[0,1]. Specifically, some embodiments assumeρ_(e), ρ_(e)*∈[0,1). Hence, |ρ_(e)−ρ_(e)*|≤1, and C_(e)^((L+1))(ρ′_(e))≤max{C_(e) ^((L+1))(ρ_(e)), C_(e) ^((L+1))(ρ_(e)*)}<∞.In particular, let W=max_(e∈E)C_(e) ^((L+1))(ρ′_(e)). Then, it is easyto compute the following upper bound on the bias of z:

$\begin{matrix}{B \leq {\frac{W{E}}{\left( {L + 1} \right)!}.}} & (30)\end{matrix}$

In addition, note that G is linear in y_(vi), and hence [1]:

$\begin{matrix}{{\frac{\partial G}{\partial y_{vi}} = {{{\left\lbrack {{{F(x)}x_{vi}} = 1} \right\rbrack} - {\left\lbrack {{{F(x)}x_{vi}} = 0} \right\rbrack}} = {{{\left\lbrack {{{C(x)}x_{vi}} = 0} \right\rbrack} - {\left\lbrack {{{C(x)}x_{vi}} = 1} \right\rbrack}} \geq 0}}},} & (31)\end{matrix}$

which is ≥0 due to monotonicity of F(x). It is easy to verify that

$\frac{\partial^{2}G}{\partial y_{vi}^{2}} = 0.$

For (v₁, i₁)≠(v₂, i₂), embodiments can compute the second derivative ofG [1] as given by

${\frac{\partial^{2}G}{{\partial y_{v_{1}i_{1}}}{\partial y_{v_{2}i_{2}}}} = {{{\left\lbrack {{{{C(x)}x_{v_{1}i_{1}}} = 1},{x_{v_{2}i_{2}} = 0}} \right\rbrack} + {\left\lbrack {{{{C(x)}x_{v_{1}i_{1}}} = 0},{x_{v_{2}i_{2}} = 1}} \right\rbrack} - {\left\lbrack {{{{C(x)}x_{v_{1}i_{1}}} = 1},{x_{v_{2}i_{2}} = 1}} \right\rbrack} - {\left\lbrack {{{{C(x)}x_{v_{1}i_{1}}} = 0},{x_{v_{2}i_{2}} = 0}} \right\rbrack}} \leq 0}},$

which is ≤0 due to the supermodularity of C(x). Hence, G(y) iscomponent-wise concave [1].

In additions, it is easy to see that for y∈

, ∥G(y)∥, ∥∇G(y)∥, and ∥∇²G(y)∥ are bounded by C(x₀), C(x₀) and 2C(x₀),respectively. Consequently, G and ∇G are P-Lipschitz continuous, withP=2C(x₀).

In the kth iteration of the Continuous Greedy method, let m*=m*(y_(k)):=(y*∨(y_(k)+y₀))−y_(k)=(y*−y_(k))∨y₀≥y₀, where x∨y:=(max{x_(i),y_(i)})_(i). Since m*≤y* and

is closed-down, m*∈

. Due to monotonicity of G, it follows

G(y _(k) +m*)≥G(y*).  (32)

Some embodiments include univariate auxiliary functiong_(y,m)(ξ):=G(y+ξm), ξ∈[0,1], m∈

. Since G(y) is component-wise concave, then, g_(y,m)(ξ) is concave in[0,1]. In addition, since g_(y) _(k) _(,m*)(ξ)=G(y_(k)+ξm*) is concavefor ξ∈[0,1], it follows

$\begin{matrix}{{{g_{y_{k},m^{*}}(1)} - {g_{y_{k},m^{*}}(0)}} = {{{{G\left( {y_{k} + m^{*}} \right)} - {G\left( y_{k} \right)}} \leq {\frac{{dg}_{y_{k},m}(0)}{d\; \xi} \times 1}} = {{\langle{m^{*},{\nabla{G\left( y_{k} \right)}}}\rangle}.}}} & (33)\end{matrix}$

Now let m_(k) be the vector chosen by Method 2 in the kth iteration. Wehave

m _(k) ,z(y _(k))

≥

m*,z(y _(k))

.  (34)

For the LHS, embodiments may include

$\begin{matrix}{{\langle{m_{k},z}\rangle} = {{{\langle{m_{k},{\nabla{G\left( y_{k} \right)}}}\rangle} + {\langle{m_{k},{z - {\nabla{G\left( y_{k} \right)}}}}\rangle}}\overset{(i)}{\leq}{{\langle{m_{k},{\nabla{G\left( y_{k} \right)}}}\rangle} + {{m_{k}}_{2} \cdot {{z - {\nabla{G\left( y_{k} \right)}}}}^{2}}} \leq {{\langle{m_{k},{\nabla{G\left( y_{k} \right)}}}\rangle} + {{DB}.}}}} & (35)\end{matrix}$

where

${D = {{\max_{m \in \overset{\sim}{}}{m}_{2}} \leq {{V} \cdot {\max\limits_{v \in V}c_{v}}}}},$

is the upperbound on the diameter of

, B is as defined in (30), and (i) follows from Cauchy-Schwarzinequality. Similarly, embodiments may include for the RHS of that (34)

m*,z(y _(k))

≥

n*,∇G(y _(k))

−DB.  (36)

It follows

$\begin{matrix}{{{{\langle{m_{k},{\nabla{G\left( y_{k} \right)}}}\rangle} + {2{DB}}} \geq {\langle{m^{*},{\nabla{G\left( y_{k} \right)}}}\rangle}\overset{(a)}{\geq}{{G\left( {y_{k} + m^{*}} \right)} - {G\left( y_{k} \right)}}\overset{(b)}{\geq}{{G\left( y^{*} \right)} - {G\left( y_{k} \right)}}},} & (37)\end{matrix}$

where (a) follows from (33), and (b) follows from (32).

Using the P-Lipschitz continuity property of

$\frac{{dg}_{y_{k},m_{k}}(\xi)}{d\xi}$

(due to P-Lipschitz continuity of ∇G), it is straightforward to see that

$\begin{matrix}{{{{- \frac{{P\gamma}_{k}^{2}}{2}} \leq {{g_{y_{k},m_{k}}\left( \gamma_{k} \right)} - {g_{y_{k},m_{k}}(0)} - {\gamma_{k} \cdot \frac{{dg}_{y_{k},m_{k}}(0)}{d\xi}}}} = {{{G\left( {y_{k} + {\gamma_{k}m_{k}}} \right)} - {G\left( y_{k} \right)} - \gamma_{k}} < m_{k}}},{{\nabla{G\left( y_{k} \right)}} >},} & (38)\end{matrix}$

hence,

$\begin{matrix}{{{{G\left( y_{k + 1} \right)} - {G\left( y_{k} \right)}} \geq {{\gamma_{k}\left( {m_{k},{\nabla{G\left( y_{k} \right)}}} \right)} - \frac{P\; \gamma_{k}^{2}}{2}} \geq {{\gamma_{k}\left( {m_{k},{\nabla{G\left( y_{k} \right)}}} \right)} - \frac{P\; \gamma_{k}^{2}}{2}}\overset{(c)}{\geq}{{\gamma_{k}\left( {{G\left( y^{*} \right)} - {G\left( y_{k} \right)}} \right)} - {2_{\gamma_{k}}{DB}} - \frac{P\; \gamma_{k}^{2}}{2}}},} & (39)\end{matrix}$

where (c) follows from (37), respectively. By rearranging the terms andletting k=K−1, embodiments may include +rCl+x*G(y_K)−G(y{circumflex over( )}*)

${{{G\left( y_{K} \right)} - {G\left( y^{*} \right)}} \geq {{\prod\limits_{j = 0}^{K - 1}\; {\left( {1 - \gamma_{j}} \right)\left( {{G\left( y_{0} \right)} - {G\left( y^{*} \right)}} \right)}} - {2{DB}{\sum\limits_{j = 0}^{K - 1}\gamma_{j}}} - {\frac{P}{2}{\sum\limits_{j = 0}^{K - 1}\gamma_{j}^{2}}}}\overset{(c)}{\geq}{{\left( {{G\left( y_{0} \right)} - {G\left( y^{*} \right)}} \right)\exp \left\{ {- {\sum\limits_{j = 0}^{K - 1}\gamma_{j}}} \right\}} - {2{DB}{\sum\limits_{j = 0}^{K - 1}\gamma_{j}}} - {\frac{P}{2}{\sum\limits_{j = 0}^{K - 1}\gamma_{j}^{2}}}}},$

where (e) is true since 1−x≤e^(−x), ∀x≥0, and G(y₀)≤G(y*) holds due tothe greedy nature of Method 2 and monotonicity of G. In addition, Method2 ensures Σ_(j=0) ^(K−1)γ_(j)=1. It follows

$\begin{matrix}{{{G\left( y_{K} \right)} - {\left( {1 - \frac{1}{e}} \right){G\left( y^{*} \right)}}} \geq {{e^{- 1}{G\left( y_{0} \right)}} - {2{DB}} - {\frac{P}{2}{\sum\limits_{j = 0}^{K - 1}{\gamma_{j}^{2}.}}}}} & (40)\end{matrix}$

This result holds for general stepsizes 0<γ_(j)≤1. The RHS of (40) ismaximized when

${\gamma_{j} = \frac{1}{K}},$

which is the assumed case in Method 2. In addition, embodiments mayinclude y₀=0, and hence, G(y₀)=0. Therefore, embodiments may include

$\begin{matrix}{{{G\left( y_{K} \right)} - {\left( {1 - \frac{1}{e}} \right){G\left( y^{*} \right)}}} \geq {{{- 2}{DB}} - {\frac{P}{2K}.}}} & (41)\end{matrix}$

APPENDIX D

Beyond M/M/1 Queues

D.1 Proof of Lemma 7

For an arbitrary network of M/M/k queues, the traffic load on queue(u,v)∈E is given as

$\begin{matrix}{{{a_{({u,v})}(x)} = \frac{\sum\limits_{r\; \in \; {R:{{({v,u})} \in \; p^{r}}}}{\lambda^{r}{\prod\limits_{k^{\prime} = 1}^{k_{p^{r}}{(v)}}\; \left( {1 - x_{p_{k}^{r},i^{r}}} \right)}}}{k\; \mu_{({u,v})}}},} & (42)\end{matrix}$

which is similar to that of M/M/1 queues, but normalized by the numberof servers, k. Hence, a_((u,v))(X) is submodular in x. For an M/M/kqueue, the probability that an arriving packet finds all servers busyand are forced to wait in queue is given by Erlang C formula [31], whichfollows

$\begin{matrix}{{{P_{({u,v})}^{Q}(x)} = \frac{{b_{({u,v})}(x)}\left( {{ka}_{({u,v})}(x)} \right)^{k}}{{k!}\left( {1 - {a_{({u,v})}(x)}} \right)}},} & (43) \\{where} & \; \\{{{b_{({u,v})}(x)} = \left\lbrack {{\sum\limits_{n = 0}^{k - 1}\frac{\left( {{ka}_{({u,v})}(x)} \right)^{n}}{n!}} + \frac{\left( {{ka}_{({u,v})}(x)} \right)^{k}}{{k!}\left( {1 - {a_{({u,v})}(x)}} \right)}} \right\rbrack^{- 1}},} & (44)\end{matrix}$

is the normalizing factor. In addition, the expected number of packetswaiting for or under transmission is given by

$\begin{matrix}{{\left\lbrack {n_{({u,v})}(x)} \right\rbrack} = {{{ka}_{({u,v})}(x)} + {\frac{{a_{({u,v})}(x)}{P_{({u,v})}^{Q}(x)}}{1 - {a_{({u,v})}(x)}}.}}} & (45)\end{matrix}$

Lee and Cohen [34] show that P_((u,v)) ^(Q)(x) and

[n_((u,v))(x)] are strictly increasing and convex in a_((u,v))(x), fora_((u,v))(x)∈[0,1). In addition, a more direct proof of convexity of

[n_((u,v))(x)] was shown by Grassmann in [35]. Hence, bothP(x):=Σ_((u,v)∈E)P_((u,v)) ^(Q)(x) and N(x):=Σ_((u,v)∈E)

[n_((u,v))(x)] are increasing and convex. Due to Theorem 1, According tosome embodiments, both functions are non-increasing and supermodular inx, and the proof is complete.

D.2 Proof of Lemma 8

Let ρ_((u,v))(x) be the traffic load on queue (u, v)∈E, as defined by(3). It can be shown that the average number of packets in queue (u,v)∈E is of form [31]

$\begin{matrix}{{\left\lbrack {n_{({u,v})}(x)} \right\rbrack} = {{\rho_{({u,v})}(x)} + {\frac{\rho_{({u,v})}^{2}(x)}{2\left( {1 - {\rho_{({u,v})}(x)}} \right)}.}}} & (46)\end{matrix}$

It is easy to see that this function is strictly increasing and convexin ρ_((u,v))(x) for ρ_((u,v))(x)∈[0,1). Due to Theorem 1,N(x):=Σ_((u,v)∈E)

[n_((u,v))(x)] is non-increasing and supermodular in x, and the proof iscomplete.

D.3 Proof of Lemma 9

TABLE 3 Results of ρ_(u,v)(x)'s for different caching configurations.[x₁₁, x₂₁] ρ_(3,2) ρ_(2,1) [0, 0] $\frac{\lambda}{\mu_{3,2}}$$\frac{\lambda \left( {1 - p_{3,2}^{L}} \right)}{\mu_{2,1}}$ [1, 0] 0 0[0, 1] 0 $\frac{\lambda}{\mu_{2,1}}$ [1, 1] 0 0

Consider the network of M/M/1/k queues in FIG. 9, where node 1 isrequesting content 1 from node 3, according to a Poisson process withrate λ. For simplicity, some embodiments may consider the traffic forcontent 1. For queues (2,1) and (3,2), it is easy to verify that theprobability of packet drop at queues (u, v)∈{(2,1), (3,2)} is given by

$\begin{matrix}{{{p_{({u,v})}^{L}\left( \rho_{({u,v})} \right)} = \frac{{\rho_{({u,v})}(x)}^{k}\left( {1 - {\rho_{({u,v})}(x)}} \right)}{1 - {\rho_{({u,v})}(x)}^{k + 1}}},} & (47)\end{matrix}$

where ρ_((u,v))(x) is the traffic load on queue (u, v), and it may becomputed for (2,1) and (3,2) as follows:

$\begin{matrix}{{{\rho_{({2,1})}\left( {x_{11},x_{21}} \right)} = \frac{{\lambda \left( {1 - x_{11}} \right)}\left( {1 - p_{({3,2})}^{L}} \right)}{\mu_{({2,1})}}},} & (48) \\{{\rho_{({3,2})}\left( {x_{11},x_{21}} \right)} = {\frac{{\lambda \left( {1 - x_{11}} \right)}\left( {1 - x_{21}} \right)}{\mu_{({3,2})}}.}} & (49)\end{matrix}$

Using the results reported in Table 3, it is easy to verify that ρ's isnot monotone in x. Hence, no strictly monotone function of ρ's ismonotone in x. In addition, it may be verified that ρ's is neithersubmodular, nor supermodular in x. To show this, let sets A=Ø, andB={(1,1)}, correspond to caching configurations [0,0] and [1,0],respectively. Note that A⊂B, and (2,1)∉B. Since

${{{\rho_{({3,2})}\left( {A\bigcup\left\{ \left( {2,1} \right) \right\}} \right)} - {\rho_{({3,2})}(A)}} = {{{- \frac{\lambda}{\mu_{({3,2})}}}0} = {{\rho_{({3,2})}\left( {B\bigcup\left\{ \left( {2,1} \right) \right\}} \right)} - {\rho_{({3,2})}(B)}}}},$

then ρ_((3,2)) may not be submodular. Consequently, no strictly monotonefunction of ρ_((3,2)) is submodular. Similarly, as

${{{{\rho_{({2,1})}\left( {A\bigcup\left\{ \left( {2,1} \right) \right\}} \right)} - {{\rho_{({2,1})}(A)}\frac{\lambda p_{({3,2})}^{L}}{\mu_{({2,1})}}}}0} = {{\rho_{({2,1})}\left( {B\bigcup\left\{ \left( {2,1} \right) \right\}} \right)} - {\rho_{({2,1})}(B)}}},\rho_{({2,1})}$

may not be supermodular. Thus, no strictly monotone function ofρ_((2,1)) is supermodular.

REFERENCES

-   [1] Gruia Calinescu, Chandra Chekuri, Martin Pál, and Jan Vondrák.    Maximizing a monotone submodular function subject to a matroid    constraint. SIAM Journal on Computing, 40(6):1740-1766, 2011.-   [2] Frank P Kelly. Reversibility and stochastic networks. Cambridge    University Press, 2011.-   [3] Robert G Gallager. Stochastic processes: theory for    applications. Cambridge University Press, 2013.-   [4] Randolph Nelson. Probability, Stochastic Processes, and Queueing    Theory: The Mathematics of Computer Performance Modeling. Springer    Publishing Company, Incorporated, 1st edition, 2010.-   [5] Hong Chen and David D Yao. Fundamentals of queueing networks:    Performance, asymptotics, and optimization, volume 46. Springer    Science & Business Media, 2013.-   [6] Van Jacobson, Diana K Smetters, James D Thornton, Michael F    Plass, Nicholas H Briggs, and Rebecca L Braynard. Networking named    content. In CoNEXT, 2009.-   [7] Edmund Yeh, Tracey Ho, Ying Cui, Michael Burd, Ran Liu, and    Derek Leong. VIP: A framework for joint dynamic forwarding and    caching in named data networks. In ICN, 2014.-   [8] Stratis Ioannidis and Edmund Yeh. Adaptive caching networks with    optimality guarantees. In SIGMETRICS, 2016.-   [9] Sem Borst, Varun Gupta, and Anwar Walid. Distributed caching    methods for content distribution networks. In INFOCOM, 2010.-   [10] Mostafa Dehghan, Anand Seetharam, Bo Jiang, Ting He, Theodoros    Salonidis, Jim Kurose, Don Towsley, and Ramesh Sitaraman. On the    complexity of optimal routing and content caching in heterogeneous    networks. In INFOCOM, 2014.-   [11] Nikolaos Laoutaris, Sofia Syntila, and Ioannis Stavrakakis.    Meta methods for hierarchical web caches. In ICPCC, 2004.-   [12] Hao Che, Ye Tung, and Zhijun Wang. Hierarchical web caching    systems: Modeling, design and experimental results. Selected Areas    in Communications, 20(7):1305-1314, 2002.-   [13] Yuanyuan Zhou, Zhifeng Chen, and Kai Li. Second-level buffer    cache management. Parallel and Distributed Systems, 15(6):505-519,    2004.-   [14] Karthikeyan Shanmugam, Negin Golrezaei, Alexandros G Dimakis,    Andreas F Molisch, and Giuseppe Caire. Femtocaching: Wireless    content delivery through distributed caching helpers. Transactions    on Information Theory, 59(12):8402-8413, 2013.-   [15] K P Naveen, Laurent Massoulié, Emmanuel Baccelli, Aline    Carneiro Viana, and Don Towsley. On the interaction between content    caching and request assignment in cellular cache networks. In ATC,    2015.-   [16] Konstantinos Poularakis, George Iosifidis, and Leandros    Tassiulas. Approximation caching and routing methods for massive    mobile data delivery. In GLOBECOM, 2013.-   [17] Qin L v, Pei Cao, Edith Cohen, Kai Li, and Scott Shenker.    Search and replication in unstructured peer-to-peer networks. In    ICS, 2002.-   [18] Edith Cohen and Scott Shenker. Replication strategies in    unstructured peer-to-peer networks. In SIGCOMM, 2002.-   [19] Karthikeyan Shanmugam, Negin Golrezaei, Alexandros G Dimakis,    Andreas F Molisch, and Giuseppe Caire. Femtocaching: Wireless    content delivery through distributed caching helpers. IEEE    Transactions on Information Theory, 2013.-   [20] Alexander A Ageev and Maxim I Sviridenko. Pipage rounding: A    new method of constructing methods with proven performance    guarantee. Journal of Combinatorial Optimization, 8(3):307-328,    2004.-   [21] Stratis Ioannidis and Edmund Yeh. Jointly optimal routing and    caching for arbitrary network topologies. In ICN, 2017.-   [22] Ivan Baev, Rajmohan Raj araman, and Chaitanya Swamy.    Approximation methods for data placement problems. SIAM Journal on    Computing, 38(4):1411-1429, 2008.-   [23] Yair Bartal, Amos Fiat, and Yuval Rabani. Competitive methods    for distributed data management. Journal of Computer and System    Sciences, 51(3):341-358, 1995.-   [24] Lisa Fleischer, Michel X Goemans, Vahab S Mirrokni, and Maxim    Sviridenko. Tight approximation methods for maximum general    assignment problems. In SODA, 2006.-   [25] David Applegate, Aaron Archer, Vijay Gopalakrishnan, Seungjoon    Lee, and Kadangode K Ramakrishnan. Optimal content placement for a    large-scale VoD system. In CoNEXT, 2010.-   [26] Andreas Krause and Daniel Golovin. Submodular function    maximization. Tractability: Practical Approaches to Hard Problems,    2012.-   [27] Pranava R Goundan and Andreas S Schulz. Revisiting the greedy    approach to submodular set function maximization. Optimization    Online, 2007.-   [28] G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of    approximations for maximizing submodular set functions—i.    Mathematical Programming, 14(1):265-294, December 1978.-   [29] Jan Vondrák. Optimal approximation for the submodular welfare    problem in the value oracle model. In STOC, 2008.-   [30] George L Nemhauser and Laurence A Wolsey. Best methods for    approximating the maximum of a submodular set function. Mathematics    of operations research, 3(3):177-188, 1978.-   [31] Dimitri P Bertsekas, Robert G Gallager, and Pierre Humblet.    Data networks, volume 2. Prentice-Hall International New Jersey,    1992.-   [32] Gruia Calinescu, Ra Chekuri, Martin PÃ_(i)l, and Jan    VondrÃ_(i)k. Maximizing a submodular set function subject to a    matroid constraint. In IPCO, 2007.-   [33] Chandra Chekuri, Jan Vondrak, and Rico Zenklusen. Dependent    randomized rounding via exchange properties of combinatorial    structures. In FOCS, 2010.-   [34] Hau Leung Lee and Morris A. Cohen. A note on the convexity of    performance measures of m/m/c queueing systems. Journal of Applied    Probability, 20(4):920?923, 1983.-   [35] W. Grassmann. The convexity of the mean queue size of the m/m/c    queue with respect to the traffic intensity. Journal of Applied    Probability, 20(4):916?919, 1983.

Method, Network, and System:

FIG. 10 is a flow diagram illustrating an example embodiment of a method1000 of the present disclosure. In some embodiments, a network includesnodes communicatively coupled to neighboring nodes via respective linksin the network for distributing cached content.

As illustrated in FIG. 10, in some embodiments, the method 1000 maycollect statistics regarding requests made and paths taken by therequests from source nodes to server nodes via intermediate nodes, thesource nodes, intermediate nodes, and server nodes interconnected byedges having queues with respective queue sizes associated therewith(1002). The requests may include indications of content items to beretrieved. The method may store the content items at the server nodes(1004). Via intermediate nodes, the method may cache the content itemsup to a caching capacity (1006). The method may perform cachingdecisions that determine which of the content items are to be cached atwhich of the intermediate nodes, based upon costs that are monotonic,non-decreasing functions of the sizes of the queues (1008).

In some embodiments, the method may be further configured to perform thecaching decisions further based on a greedy computer-implemented method.In some embodiments, the costs may be associated with one or more of theedges. In some embodiments, the method may determine the costs basedupon a Taylor approximation. In some embodiments, the costs may beexpected costs.

In some embodiments, the method may perform the caching decisionsfurther based on marginal gain. In some embodiments, the marginal gainmay be associated with the costs with respect to the determination ofwhich of the content items are to be cached.

FIG. 11 is a network diagram that illustrates a computer network orsimilar digital processing environment 1100 in which embodiments of thepresent disclosure may be implemented. Client computer(s)/devices 50(e.g., computing devices/display devices) and server computer(s) 60(e.g., a Cloud-based service) provide processing, storage, andinput/output devices executing application programs and the like. Theclient computer(s)/devices 50 (e.g., computing devices/display devices)can also be linked through communications network 70 to other computingdevices, including other client devices/processes 50 and servercomputer(s) 60. The communications network 70 can be part of a remoteaccess network, a global network (e.g., the Internet), a worldwidecollection of computers, local area or wide area networks, and gatewaysthat currently use respective protocols (TCP/IP, BLUETOOTH™, etc.) tocommunicate with one another. Other electronic device/computer networkarchitectures are suitable. According to some embodiments, caching andforwarding may be performed in distributed locations (i.e., at eachnetwork node).

FIG. 12 is a block diagram of an example internal structure of acomputer (e.g., client processor/device 50 or server computers 60) 1200in the computer system or apparatus of FIG. 11. Each computer 50, 60includes a system bus 79, where a bus is a set of hardware lines usedfor data transfer among the components (e.g., entities) of a computer orprocessing system or apparatus. The system bus 79 is essentially ashared conduit that connects different elements of a computer system orapparatus (e.g., processor, disk storage, memory, input/output ports,network ports, etc.) that enables the transfer of information betweenthe elements. Attached to the system bus 79 is an I/O device interface82 for connecting various input and output devices (e.g., keyboard,mouse, displays, printers, speakers, touchscreen etc.) to the computer50, 60. A network interface 86 allows the computer to connect to variousother devices attached to a network (e.g., network 70 of FIG. 11).Memory 90 provides volatile storage for computer software instructions92 and data 94 used to implement embodiments of the present disclosure(e.g., including but not limited to including any of the processor,memory, network node, network, network manager, or any other device,engine, system, module, or controller described herein). Disk storage 95provides non-volatile storage for computer software instructions 92 anddata 94 used to implement some embodiments of the present disclosure.Note, data 94 may be the same between a client 50 and server 60,however, the type of computer software instructions 92 may differbetween a client 50 and a server 60. A central processor unit 84 is alsoattached to the system bus 79 and provides for the execution of computerinstructions.

As illustrated in FIG. 12, in an embodiment, the system or apparatus 800includes a processor 84 and a memory 90 with computer code instructionsstored therein. The memory 90 is operatively coupled to the processor 84such that the computer code instructions configure the processor 84 toimplement content delivery.

In some embodiments, the network of FIG. 11 includes network nodes 50.The network nodes 50 may include source nodes, server nodes associatedwith one or more servers, or intermediate nodes. The source nodes,intermediate nodes, and server nodes may be interconnected by edgeshaving queues with respective queue sizes associated therewith. Thenetwork of FIG. 11 may be configured to collect statistics regardingrequests made and paths taken by the requests from source nodes 50 toserver nodes 60 via intermediate nodes.

The requests may include indications of content items to be retrieved.The content items may be stored at the server nodes 60. The intermediatenodes may be configurable to cache the content items up to a cachingcapacity. The network (which may be at least partially implemented asprocessor unit 84 of FIG. 12) may be configured to perform cachingdecisions that determine which of the content items are to be cached atwhich of the intermediate nodes, based upon costs that are monotonic,non-decreasing functions of the sizes of the queues.

Referring back to FIG. 12, in some embodiments, the processor routines92 and data 94 are a computer program product (generally referenced 92),including a computer readable medium (e.g., a removable storage mediumsuch as one or more DVD-ROM's, CD-ROM's, diskettes, tapes, etc.) thatprovides at least a portion of the software instructions for thedisclosure system. Computer program product 92 may be installed by anysuitable software installation procedure, as is well known in the art.In another embodiment, at least a portion of the software instructionsmay also be downloaded over a cable, communication or wirelessconnection. In other embodiments, the disclosure programs are a computerprogram propagated signal product 107 (shown in FIG. 11) embodied on apropagated signal on a propagation medium (e.g., a radio wave, aninfrared wave, a laser wave, a sound wave, or an electrical wavepropagated over a global network such as the Internet, or othernetwork(s)). Such carrier medium or signals may be employed to provideat least a portion of the software instructions for the presentdisclosure routines/program 92.

Embodiments or aspects thereof may be implemented in the form ofhardware (including but not limited to hardware circuitry), firmware, orsoftware. If implemented in software, the software may be stored on anynon-transitory computer readable medium that is configured to enable aprocessor to load the software or subsets of instructions thereof. Theprocessor then executes the instructions and is configured to operate orcause an apparatus to operate in a manner as described herein.

Further, hardware, firmware, software, routines, or instructions may bedescribed herein as performing certain actions or functions of the dataprocessors. However, it should be appreciated that such descriptionscontained herein are merely for convenience and that such actions infact result from computing devices, processors, controllers, or otherdevices executing the firmware, software, routines, instructions, etc.

It should be understood that the flow diagrams, block diagrams, andnetwork diagrams may include more or fewer elements, be arrangeddifferently, or be represented differently. But it further should beunderstood that certain implementations may dictate the block andnetwork diagrams and the number of block and network diagramsillustrating the execution of the embodiments be implemented in aparticular way.

Accordingly, further embodiments may also be implemented in a variety ofcomputer architectures, physical, virtual, cloud computers, or somecombination thereof, and, thus, the data processors described herein areintended for purposes of illustration only and not as a limitation ofthe embodiments.

While this disclosure has been particularly shown and described withreferences to example embodiments thereof, it is understood by thoseskilled in the art that various changes in form and details may be madetherein without departing from the scope of the disclosure encompassedby the appended claims.

Embodiments solve a technical problem, thereby providing a technicaleffect, or provide technical advantages or functional improvementsincluding making caching decisions to reduce traffic dependent costs atnetwork queues. Some embodiments have provable optimality guarantees,attaining a cost reduction within a factor of approximately 0.67 fromthe optimal cost reduction attained by existing methods. Embodimentsavoid randomization and sampling, attaining provable guarantees at areduced computational complexity. Embodiments approximate arbitraryconvex cost functions via power and Taylor series approximations,leading to a fast and efficient implementation avoiding randomizationand sampling. Embodiments significantly outperforms existing approachesin both caching decisions and computational complexity.

Yet further, embodiments provide technical advantages or functionalimprovements in that such embodiments can directly find application in asystem where content is to be placed in a network with varying demandincluding but not limited to (i) Content delivery networks, (ii)Information centric networks, (iii) Peer-to-peer networks, and (iv)Cloud computing.

While example embodiments are particularly shown and described, it isunderstood by those skilled in the art that various changes in form anddetails may be made therein without departing from the scope of theembodiments encompassed by the appended claims.

What is claimed is:
 1. A network manager configured to: collectstatistics regarding requests made and paths taken by the requests fromsource nodes to server nodes via intermediate nodes, the source nodes,intermediate nodes, and server nodes interconnected by edges havingqueues with respective queue sizes associated therewith, the requestsincluding indications of content items to be retrieved, the contentitems being stored at the server nodes, the intermediate nodesconfigurable to cache the content items up to a caching capacity; andperform caching decisions that determine which of the content items areto be cached at which of the intermediate nodes, based upon costs thatare monotonic, non-decreasing functions of the sizes of the queues. 2.The network manager of claim 1, wherein the network manager is furtherconfigured to perform the caching decisions further based on a greedycomputer-implemented method.
 3. The network manager of claim 1, whereinthe network manager is further configured to perform the cachingdecisions further based on a continuous greedy computer-implementedmethod.
 4. The network manager of claim 1, wherein the costs areassociated with one or more of the edges.
 5. The network manager ofclaim 1, wherein the costs are determined based upon a Taylorapproximation.
 6. The network manager of claim 1, wherein the costs areexpected costs.
 7. The network manager of claim 1, wherein the networkmanager is further configured to perform the caching decisions furtherbased on marginal gain.
 8. The network manager of claim 7, wherein themarginal gain is associated with the costs with respect to thedetermination of which of the content items are to be cached.
 9. Acomputer-implemented method for distributing cached content in anetwork, the computer-implemented method comprising: collectingstatistics regarding requests made and paths taken by the requests fromsource nodes to server nodes via intermediate nodes, the source nodes,intermediate nodes, and server nodes interconnected by edges havingqueues with respective queue sizes associated therewith, the requestsincluding indications of content items to be retrieved; storing thecontent items at the server nodes; caching, by the intermediate nodes,the content items up to a caching capacity; and performing cachingdecisions that determine which of the content items are to be cached atwhich of the intermediate nodes, based upon costs that are monotonic,non-decreasing functions of the sizes of the queues.
 10. Thecomputer-implemented method of claim 9, further comprising furtherperforming the caching decisions further based on a greedycomputer-implemented method.
 11. The computer-implemented method ofclaim 9, further comprising further performing the caching decisionsfurther based on a continuous greedy computer-implemented method. 12.The computer-implemented method of claim 9, wherein the costs areassociated with one or more of the edges.
 13. The computer-implementedmethod of claim 9, wherein the costs are determined based upon a Taylorapproximation.
 14. The computer-implemented method of claim 9, whereinthe costs are expected costs.
 15. The computer-implemented method ofclaim 9, further comprising further performing the caching decisionsfurther based on marginal gain.
 16. The computer-implemented method ofclaim 15, wherein the marginal gain is associated with the costs withrespect to the determination of which of the content items are to becached.
 17. A network node, the network node comprising: a networkinterface; and a processor configured to collect statistics regardingrequests made and paths taken by the requests from source nodes toserver nodes via intermediate nodes, the source nodes, intermediatenodes, and server nodes interconnected by edges having queues withrespective queue sizes associated therewith, the requests includingindications of content items to be retrieved, the content items beingstored at the server nodes, the intermediate nodes configurable to cachethe content items up to a caching capacity, and perform cachingdecisions that determine which of the content items are to be cached atwhich of the intermediate nodes, based upon costs that are monotonic,non-decreasing functions of the sizes of the queues.
 18. The networknode of claim 17, wherein the processor is further configured to performthe caching decisions further based on a greedy computer-implementedmethod.
 19. The network node of claim 17, wherein the processor isfurther configured to perform the caching decisions further based on acontinuous greedy computer-implemented method.
 20. A computer programproduct including a non-transitory computer-readable medium havingprocessor-executable instructions stored thereon, the instructions, whenloaded and executed by a processor, cause a node in a network, the nodecommunicatively coupled to neighboring nodes in the network viarespective links, to: collect statistics regarding requests made andpaths taken by the requests from source nodes to server nodes viaintermediate nodes, the source nodes, intermediate nodes, and servernodes interconnected by edges having queues with respective queue sizesassociated therewith, the requests including indications of contentitems to be retrieved, the content items being stored at the servernodes, the intermediate nodes configurable to cache the content items upto a caching capacity; and perform caching decisions that determinewhich of the content items are to be cached at which of the intermediatenodes, based upon costs that are monotonic, non-decreasing functions ofthe sizes of the queues.